c DOCUMENT RESDME 



ED ^^2 2uo 



IR 005 144 



AOTHOE 
TITLE . 



INSTITOTION 
SPONS AGEFCY 

REPORT NO 
POB DATE 
CONTRACT 
NOTE 



Burtonr Richard R, ; Brown^ John Seely . ' 
Semantic Grammar: A Technique for Constructing* 
Natural Language' Interfaces to Instructional 
Systems. / 

Boltr Beranek and Newman ^ Inc^ Cambridge ^ Mass, 
Advanced Research Projects Agency (DOD) , Washington^ 
D-C. 

BBN-3587; ICAI-5 . o 

May 77 

MDA903-76-C-C1C8 
118p, . 



EDES PRICE ^ 
DESCRIPTORS 



IDENTIFIERS 



HF-$0,83 HC-$6.01 Plus Postage- . 
Artificial intelligence; Computer Assisted 
Instruction; computer Programs ; Educational 
Environment; *Grammar; ^Information Processing; 
♦Instructional Systems; Logical Thinking; *Man 
Machine Systems; *Programing Languages; *Semant;i<:s 
Semantic Grammar; SOPHIE I 



^ABSTRACT 

A major obstacle to the effective educational use of . 
computers is the lack; of a natural m^ans of commdnication between the 
student and the computer. This report describes a technique for . 
generating such natural language front*ends for advanced ^ 
instructional systems. It discusses: (1) the essential properties of 
a natural language front-end ^ (2) some prior systems having some of 
the desired capabilitieSr and (3) the technical details underlying, 
"semantic grammars." This lalst 'np'tion is introduced as a paradigm for 
organizing the. knowledge required to under stalid language which 
permits efficient parsings In semantic grammar ^ non-terminal 
categories are formed on conceptual rather than syntactic bases. This 
allows semantic knowledge to be integrated into the. parsing process , 
whenever it is beneficial. The ability of Augmented Transition 
Networks (ATJI) -based semantic grammars to perform satisfactorily in 
an educational environment is demonstrated, in the natural language 
front-end for the SOPHIE system. Appendices describe. SOPHIE semantic 
grammar in two formalisms (ATN and BTN) • (Author/DAG) 



* . Documents acquired by ERIC include many informal unpublished -* 
. * materials not available from other sources* ERIC makes every effort * 
,* to obtain the best copy available. Nevertheless ^ items of marginal * 

' * reprodxicibiiity are often encountered and this affects the quality *. 

* of the. microfiche and hardcopy reprodiactions ERIC makes available * 

* via. the ERIC Document Reproduction Seryice (EDRS) • . EDES is not .* 
.♦responsible for the quality of the original document* Reproductions * 

; * supplied by EDRS are the best that can be made from the original.. *. 



ERLC 



t 



BBN Report No. 3587 
ICAI Report No. 5 



U S OEPARTMENTOF HEALTH. 
EOUCATION ^WELFARE 
NATIONAL INSTITUTE Oi^ 
EDUCATION 

THIS DOCUMENT HAS BEEN REPRO- / 
OUCEO EXACTLY AS. RECEIVED FRDM , 
THE PER SDN DR ORGANIZATION ORIGIN- / 
AT'iNO IT POINTS OF VIEW OR OPINIONS, 
^.STATED DO MOT. NECESSARILY REPRE"' 
'SENT OF FICIAL JSlATlONAL INSTITUTE OP 
EOUCATION POSITION OR POLKY / ' 



■ SEMANTIC GRAMMAR:,, A TECHNIQUE FOR" CONSTRUCTING 
NATURAL LANGUAGE INTERFACES TO INSTRUCTIONAL SYSTEMS 

■ ■■ ; / 




/ 



TTichard R. Burton John Seely Brown 



/ 

M^y 1 



■ ■ ■ ■"' . ■ , / ■ . ■ ■ ^ ■ :• ■ 

This research was supported in p'art, by the Advanced Research Projects Agency, 
Air Force Human Resources Laboratory, Army Research Institute for Behavioral- 
and Social Sciences, and Navy Personnel Research and Development Center under 
Contract No. MDA903-76-C-01 08./ 

The. views and conclusions contained in this document are those of the authors 
and should not be interpreted 'as necessarily representing the official policjes 
either expressed or implied, pr the U.S. Government. 



Unclassif I(;.>d 



SeCumTY CLA5Siri^AT<0N OF THIS PAGE (Whti Dmtm EntmrmdS 



- -y. JREPOIIT NUMBER 



REPORT DOCUMEtitATION PAGE 



BBN Report No. 3587 



/ 



2. GOVT ACCESSION NO 



READ m f RUCTIONS 
BEFORE CO ?LETING FORM . 



3. flKCII»ieN.T'S ^LOG NUMBER 



4. TITLE (wid 5ubf Iff •> 

Semantic Grammar: A Tephnique for Constructing 
Natural Language Interfaces to Instructional 



5. TYPE OF REPO^^T ft PERIOO COVErEO 

Technical Report. 



6. PERFORMING Ot^G. REPORT NUMBER 



7. AUTHORf*) 

Richard R. Burton 



John Seely Brown 



8. CONTRACT OR GRANT NUMBERfcJ 

MDA903-76-C-0108 ' 



9. PERF0RH{<«4G ORGTANI Z ATtON NAME ANO AOORESS 

Bolt Beranek & Newman Inc. 
50 Mpult-on Street 
Cambridge I'A 02138 



10. pROGf^AM ELEMENT. PROJECT. TASK 
AREA ft WORK UNIT NUMBERS 



|K CONTROLLING OFFICE NAME ANO AOORESS 

Defense Advanced Research Projects Agency 
1400 Wilson Boulevard 
Arlington VA 22209 



12. REPORT OATE 

May 1977 



13. 14UMBER OF PAGE^ 

M07" / 



U. MONlTpRlfJG AGENCY NAME ft AOORESSfl/ ditlerent trom Controlling Otdcm) 

Army Research Institute for Behavioral and 
•—'Social Sciences ' ■ ' ; , 

5001 Eisenhower Avenue^' - 
♦ Alexandria VA 



tS. SECURITY CLASS, (ol thim rmport) 

Unclassified 



tS«. DECLASSIFICATION/' OOWnCRAOING 
SCHEDULE 



16. DISTRIBUTION r,J f^JEfAEN J (ol tbf m Rmport) 



17. OlSTf<IBUTiON STATEMENT (ol tbm •bsfracf anfered In Btock 20, If dlttmr^nt /rom Rmporl) 



Approved for public* rel ease; distribution unlimited 



la. SUPPL EMENTARY NOTES 



This research was supported in part by the Advanced. Research Projects 
Agency, Air Force Human Resources Laboratory, Army Research Institute for 
Behavioral and Social Sciences, and Navy Personnel Research & Development 
Center. ' 



19 K£Y WORDS (Contlnu9 on rtvram midm It nmcmmmmry mid Id-itttlly by block numbmr) 



Semantic Grammar, Natural Language /iterfaces^ Reactive Learnini^ Fiiviron- 
ment, SOPHIE, Augmented Transition Networks, Habitability , InteiJi^ent 
CAI 



20 ABSTRACT (Contlnu9 on rmvmrmm mfdm It nmc^mamry and idmntUy by btock numbmr) 

One of the major stumbling blocks to thp more .ffeccive educational 
use of computers is the lack of -a natural meajis of communication between 
the student and the computer. This report addresses the problems of deve- 
loping a system that can understand natural language (English) for advancec 
compu ter-based ins true tional v<5y stems . , Trainiag ' environments impose the 
following requirements on a natural language understanding system: (1) 
efficiency, (2) habitahl-lity , (3) self-teachability, and (4) awareness (OVdR) 



DD , 



FORM 
JAN 73 



1473 



EDITION OF I NOV 6S IS OBSOLETE 



Unclassified 



3 



SECURITY CLASSiriCATlOK OF tHiS PACE (Vh^n Omtm Bnt^fd) 



JU[nc^4'assif ied 



SgCUt^iTY CLASSIFICATION OF THIS PAOZfWhm D»im Bntmfd) 



of j^mbiguity • The major leverage points that allow these requirements to 
be jiet are: ; (1) limited domain^ (2) limited activities within that domain, 
and '(3) known conceptualizations of the domain. In other words,, we must 
.knb^^ the problem area, the typ^e of problem the. student" is trying trr, solve 
and^'^the way he should be thinking about the problem in order to solve it* 

;| "jjie notion of semantic grammar is introduced as a paradigm for orgarii' 
ziii^ /the knowledge required to understand langu^age which permits efficient 
parsing. In semantic grammar, non-terminal categories are formed on con- 
ceppjal rather than ^syntactic bases. This allbws semantic knowledge to be 
int^ j(rated into the parsing process whenever it. is beneficial. The seman- 
tics grammar also lends itself to a simple yet powerful method of handling 
pronj^minalizations, ellipses and other sentence fragments that arise 
naturally in a dialogue situation. 

; .Th^jP need for a succinct formalism for expressing semantic grammars 
led feo the use of the Augmented Transition Networks (ATN) . The ability 
of ATN-based semantic grammars to perform satisfactorily in an educational 
environment is demonstrated in the natural language front-end. fpr_the 
S0PH;"E . ;ystem . T ~ ^ 



Uncl assified . • 

SECURITY CLASSIFICATION OF THIS PACEfWh»fi Dmim BnUemd) 



. TABLE OF CONTENTS ■ • 

C' ■ \ " ■ . ' - ■■ ■■ . - . ■ 

Page 



Abstract* . . . . . 

■ ' I 

■ i . 

Preface . . . . . / , 



CHAPTER 1 - REQUIREMENTS FOR 'A NATURAL LANGUAGE INTERFACE FOR 

' - ■ INSTRUCTIONAL SYSTEMS . ■ 1 

Requirements . . 1 

CHAPTER 2 - RELATED SYSTEMS 6 

-Keyword Schemes- /• • • 6 

PARRY- ............ 7 

NLPQ & 

CONSTRUCT. . ./ 9 

RENDEZVOUS .../...... ' . . 10 

LUIJ.R. ...../....... . ... 11 

Discussion . . . ^ ./...-...... 11 



CHAPTER 3 - SAMPLE DIALOGUE. .- >/• • • • 

CHAPTER 4 - SEnIaNTIC GRAMMAR . .■•'. ./. ^ . 20 



Introduction ./.^..-~. . . . . J---. . / 20 

Representation/ of Meaning. . . . i . 22 

Result of the Parsing 22 

Use of ^Semantic Information During Parsing ;../..' . 26 

Prediction ■ j . . • • • 26 

Simple Deletion 27 

Ellipsis . . f . . • • • • • 27 

Using Context/ to Determine Referents /...'....... ■ 291' 

Pronouns and /Deletions ./ 29 

Referents for Ellipses ,./........-... 31 

Limitations (.o the Context Mechanism . . . . . . * . ... : * . . . 31 

Relationship/ to Other Semantic Systems 32 

Fuzziness- 33 

Preprocessing. 34 

Iraplementatian 35 

CHAPTER'S A NEW FORMALISM >EMANTIC AUGMENTED 

j- TRANSITION KETvvOPKS - 37. 

Augmented Transition Networks. . . . .» . . . . . . . . . '37 

Advantages^ of ATN .Fonnalism . . ........ 40 

Conversion' to Semantic ATN ^7 

Fuzziness/ . 47 

Comparison of Results. ............... ^. 48 



CHAPTER 6/ - OBSERVATIONS ON STUDENT USAGE .50 

Impressions, Experiences and Observations. 50 

Feedback/ - When tbo Grammar Fails 52 

CHAPTER /7 - CONCLUDIOT. DISCUSSION. . . . . .7. 56 



Future Resear^c.h Areas. ..^ ......... . 56 

Conclusions 57 

References 59 



Appeti'dix' A: BNF Description of Part of the SOPHIE Semantic Grammar . 63. 

•• •/ ■ . ■ ■■■■ ■ 

■ ■. / ^ . ■ ■ . , 



Appendix B: A'' LISP Rule from the Semantic Grammar 

Appendix c/sample Parses and Parse Tiaes for the 
LISP Implementation 



Appendix" D." Examples of ATN Co'npi.lation*. , 

Version I 

Version IJ .".!!!! 

Trace of Version I Parsing c Sentence ... . . . . . . . . . 



Tracing 

Breaks. .... i ........ . 

How to Get Into a ETreak 

Appendix G: ATN Description of Part . of the SOPHIE Semantic Grammar . 

Graphic. Form of Semantic ATN. 

Input Form of Semantic ATN. 




Appendix. E: Graiainar Compiler Declarations 

/' Specification of Features 

/ Declarations for Arc Tests and Actions 



/ 



/ 



/ 



./ 



^ ■ ABSTRACT 

One of the major stumbling blocks to the more effective educational 
use of computers is the lack of a natural means of communication between 
the student and the computer. This report addresses the problems of, 
developing a system that can understand natural language (English) for ■ 
ad-vanced computer-based instructional systems. Training environments 
impose the following requirements on a natural language understanding 
system: (1) efficiency, (2) habitability , (3) self-teachability, and (4; 
awareness of ambiguity. The major ' leverage points* that allow these 
requirements to be met ar^: (1) limited domain, (2) limited activities 
within that domain, and (3) known conceptualizations of - the domain. In 
other words, we must know the probl^rn area, the type of probl?m the student 
is trying to solve and the way he should be thinkifig about the problem in 
order to solve it. 

The notion of semantic grammar is introduced as a paradigm for 
organizing the knowledge required to understand language, which permits 
efficient parsing. In semantic'^ grammar., non-terminal categories are fprmed 
on conceptual rather than syntactic bases. This allows semantic knowledge- 
to be integrated into the parsing process whenever.it is beneficial. The 
semantic grammar also lends itself to a .simple yet powerful method of 
Handling pronominalizations, ellipses and other sentence' fragments that 
arise naturally in a dialogue situation. | 

The need for a succinct forjnalism for expressing semantic \rammars led. 
to. the use of the Augmented Transition Networks (ATN). r The ability of 
ATN-based semantic grammars to perform satisfactorily in an educational 
environment is demonstrated in the natural language front-end for thy?': 
SOPHIt system. ^ ■ 



■ Preface 

With the advent of knowledge-based instructional, systems that can 
answer trainees' questions, / critique their hypotheses and automatically 
provide remedial hints, the' need for a man-machine interface that 
facilitates rather than hinders a student's communication with the machine 
becomes ever more pressing. This report describes a general technique for 
generating "friendly", efficient and robust natural language front ends for 
advanced instructional systems. The generality of this technique has been 
proved by its successful application in a range of instructional systems; 
its efficiency has turned out to rival the keywords parsers which underly 
most of the classical CAI systems; its robustness has been attested to by 
the fact that it has been able to handle nearly every serious query posed 
'to our electronic instructional systems in the course of a lesson or 
exercise. * 

In this report we first discuss the essential properties that comprise 
a "friendly" natural language front-end for an instructional system. Next, 
we discuss some prior systems that have some, but not all, of the desired 
capabilities and then we focus on the technical' details underlying 
"semantic' grammars*' — a new technique for produc^-ng rthe desired 
man-machine interfaces » .^i^Llthough there is little emphasis placed on the 
analysis of how student's; . used the^Jpapabilities afforded by this kind of 
natural language Interfafoe (madfi ^pb'ssible by semantic "grammars), a 
dompanion report contains; the a^^ysis. of . nearly twelve thousand natural- 
language interactions qollect^ed. from students using instructional systems 
built around' this technique. \ - ^ 



Chaptisr 1 ' . . ■ \ 

' . , ' ■ . ■ \. 

■ REQUIREMENTS FOR A NATURAL '\ 

*. . LANGUAGE INTERFACE" FOR. INSTRUCTIONAL SYSTEMS' - , ' ^ 

This research "'arose from the need Tor natural language interfaces to 
complex instructional systems 'which underly. reactive training environments. 
As used here, the term ♦•reactive training environment" refers to flexible 
.problem solving, laboratory-like situations thay have been implemented on a 
computer. The environment is reactive in the' sense that the computer can 
(in addition to implementing the laboratory) monitor the student's 
activities and* provide tutorial feedback during the solution of problems. 
A characteristic of such systems is that the computer-naive students are 
involved in a training situation in which the computer is merely the 
^pedium. Most certainly these students are not interested in state-of-art^ 
man-machine, communication; they must .be free to concentrate on" solving 
th'^ir problems and learning from their solution paths and errors. :, 

This instructional environment places constraints, on a natural 
language understanding system that exceed the capabilities of all existing 
systems. These constraints include: (1) efficiency (2) habltability (3) 
self-teachability and (4) the ability .to exist with ambiguity .--Tn" the 
.remainder of this chapter we will explore why. these are important,-, and then 
provide an overview of the remainder of this report. 

Requ irements ^ 

' A primary requirement for a' natural language processor, in' an 
instructional situation, is speed. Imagine the following setting: ^ the 
.^student is at a terminal actively working dri' a problem. ^^H"e decides that he 
needs another piece of information to .advance his solution, so he 
formulates a quei'y. Once he has finished typing his. question , he will wait 
for the" systenv to give him an answer before he continues working^on his 
SsOlutlons. During the time it takes the system to parse his query, the 
student is apt to forget pertinent information and lose interest. 
Psychological experiments have shown that response delays longer than two 
seconds have serious effects on the performance of complex tasks via 
terminals (Miller 68). In these two seconds, the system/ must understand 
tne query; deduce, infer, lookup or calculate the answer; and generate a 



response. ( T) . . - 

The second requirement for .a natural language " front -^♦^nd is 
habitability. Any natural language system written in the forseeahle future 

.is npt going to be able to understand all of natural language. What it 
must^ do is characterize and understand a useable subset oi the language. 
Watt ( 1968 p. 338) defines a "habitable" sub-language a55 "one in which its 
users can express -themselves without straying over the language boundaries 
Into unallowed sentences". Very intuitively, for a system tp^be habitable 
it must, among ot.ier things, allow the user to 'make' local or minor 
modifications to an^ accepted sentence arid get* ano'ther accepted sentence. 
Exactly^iiovT much modification constitutgs^ a minor change has never been 

-S'p'ecif.ied. Some examples may provide more insight into this not-ion. 

(1) Is anything wrong? - - . 

(2) Is there anything wrong? 

(3) ~Is there something wrong? 

.(5) Is there anything wrong with gection 3? 
..(5) Does it look to you like section 3 could have a problem? 

If a problem solving system accepts sentence 1 , It should also accept the 
modifications given in sentence 2. and' 3. Sentence M presents a minor 
syntactic extension which may have major repercussions in the semantics but 
which' should also be accepted. Sentence 5 is an example of a possible 
paraphrase of sentence ^ which is beyond the intenderf-^^ notion of 
habitability . Based on the acceptance of sentences the user has no 

reason to expect that sentence 5 will be handled. 

■ Any • sub7language which does not maintain a high degree oi habitability 
•is apt to be" worse than no n'ktural language capability at all. Because, in 
addition to the problem he is seeking information about., the student is 
faced^^ sporadically, with the problem of getting the system to understand 
his query. This second problem can be disastrous both because it occurs 
seemingly at random and because it is ill-defined. In an informal 
experiment to test the habitability of a system, the authors asked a group 
of four studentis to write down as^. many ways as possible of asking a. 

(1) Another effect of poor response time which is critical Po intelligent 
monitoring systems is that more of the^. .student 's searching .for thie answer 
is done internally (i.e. ^-ditK6lft^. uslilg. bhe system). Thi^s decreases the 
amount of information the- tutoring ^syslem receives and Iftereases' the amount 
of induction that must ' '5fe performed-,^ making the probl^ of figuring out 
what the student is dplng mii^h harder (ec^g. the student^won't "show his 
work" wh-en solving a problem^ he will jijst present the -answer ) . 




particular question. The original idea was to determine how many of the 
various paraphrasing would be accepted. students each came up with one 

phrasing very quickly but had treme*;. o:''^ ni faculty thinking of any others," 
even though three of the first p'T^Mn^is vere different! This experience 
demonstrates the lack of ' student '^3 ability to do "linguistic" problem - 
solving and points out the importance of accepting the student's first'; 
phrasing. 

An equally ' import^^u aspect of the" :habitability\problea is the- 
multi-sentence (or dialogue) phe>iomena. When students us§'*a system that 
exhibits "intelligence" through its inference^ capabilities,, they :quickly 

■ start to assume \ that the system, must also be intelligent in its 
conversational abilities as well. For example, they will frequently , delete 

k^parts^^of their statements which they feel are obvious, given- the context of. 
^t;he preceding statements. Often they are totally unaware of ■ such deletions 
arrd show--s4^rprise and/or angei- when the system fails ..to. utilize i^textual. 
information as ''clearly as they (subconsciously) do. The use of- context 
manifeists itself in the use of such linguistic phenomena as 
pronbmlnalizations,^ anaphoric deletions and "ellipses. The following 
sequence of questions exemplifies these problems: 

(6) What is the population of Los Angeles? 

(7) What is it for San Francisco? 

(8) What about San Diego? 

■ \- ■ ■ ■ ■ . • 

The \ third requirement for a natural language processor is that it be 

self-rteatahing. As the student uses the system, . he should bejgin to fQ'el the 

range and limitations of the sub-language. When the student uses a 

sentence that the system can't understand, he should receive feedback that 

wiir enable him to determine why it can't. There are at least two kinds of 

feedback, Th,e simplest (and most often seen) merely provides some 

■ . ■ ■ ■ *^ , 

Indication of what parts of the sentende caused ..the problem (e.g,... unknown 

word or phrase). . A more" useful kind of feedback goes on to provide a 
response based on thos^ parts of the sentence t»iat- did make sense and then 
indicate (or give examples of) possibly related, acceptable sentences. It. 
may even be advantageous- to have the system recognize common unacceptable- 
sentences and In response to them, explain why they are not in the 
sub-language. (See chapter 6 for .further discussion of this point.) 



The fourth requirement for a natural language system is that -it be 
aware^^ of ambiguity. Natural language gains a good deal of f lexibi.l.lty 'and 
power by not forcing every meaning into a different, surface structure. 
This means that the program that Interprets natural language seritences 
must be. aware that more than one interpretation is possible. For example, 
when asked: ^ J '-'^' 

(.,9) Was John believed -to have been^^shot by Fred? 
one of the most potentially disastrous responses is "Yes*'. The user may 
not, be sure whether Fred did the shooting or the believing or. both. More 
likely, the user, being unaware of any ambiguity,- assumes an interpretation; 
that may be different than the system's, if the system -s • interpretation Is 
different, the , user thinks he has. received the answer to his query when In 
fact he has received the answer to a completely independent query. - 

:vither of the following is a much better response: 

(10) Yes, it is believed that Fred shot John. 
U1) Yes, Fred believes that John was shot. 

'The- system need not necessarily have tremendous (disambiguation skills, but 
it must be aware that mis-interpretations' are possible and inform the user 
of its intert)retation. In those cases. where th^ system makes ^a mistake, the 
results may be annoying but .should not be catastr5phic. 

. Xh'is report presents .the development of a technique that we . have named 
"semantic grammars" for building natural lanjfeuage processors that satisfy 
the above constraints. Chapter 2 discusses other systems which att&ck some 
of these problems. > Chapter 3 presents a dialogue from the "intelligent" 
CAI system SOPHIE, '-that we used to refine and demonstrate this technique. 
This dialogue provides concrete examples, of the kinds -of linguistic, 
capabilities, that can be achieved ; using semantic grammars. Chapter 4 
describes semantic gram^r as it -first evolved in SOPHIE, and points out 
how it allows semantic intormation to be used to handle dialogue 
constructs, and to allow the directed ignoring of words. in the input: 
Chapter 5 discusses- the limitations that were, encountered in the evolution 
o-f semantic grammars In SOPHIE as the range of Ipentences was increased and 
how these, might be overcome by using a different formalism ~ augmented 
transition networks (ATN). Chapter 5 also reports .on the conversion of the 

._ 14 - 
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SOPHIE semantic grammar to an ATN, and the extensions to the ATN formalism 
which were necessary to maintain the s'felutions presented In chapter 4. 
Chapter ";> also Includes comparison timings between the two versions of tfte 
natural language processor. Chapter h describes experiences we have had 
with SOPHIE, and presents techniques developed to handle problems in the 
area of non-understood sentences. Chapter 7 suggests directions for future 
work* 



. Chapter 2 
RELATED SYSTEMS 



In this chapter we will describe a number of , different t^chniqups that' 
have evolved from research in the area of natural language understanding as 
applied to practical tasks. Our purpose is to describp- a set of techniques 
that have been developed* to handle a natural language input throughout a 
range of complexity. We also seek to dispel the idea^that there is a 
."natural-language" as it applies to interfacing to computer systems, or 
that there exists one "best*' technique for every aopiication. 

KEYWORD" SCHEMES ' ' ' 

Perhaps the oldest and simplest method of dealing with unrestricted 
natural language was through ..keywprd parsing; The technique was introduced 
by Weizenbaum (1966a) and has been used and extended by others (e.g., 
Weizenbaum 1966'6-:j'' Brown et al . 1973, Shapiro et al^. 1975, Colby et al . 
197^-) . Uaiiig this- 'parsing scheme , .an input sentence is searched for "key" 
weirds. Each l<eyword is associated with a collection ^of patterns that are 
then tested against the complete input. If a pattern matches, an action 
a,ssociated with that pattern .(typically a reassembly rule which constructs 
■an output sentence by reassembling pieces of. input) is executed. This 
action represents the "meaning" of the sentence to t^.e system (i.e. the 
sentence's semantics). ^ • 

Keyv/ord analysis schemes have the advantage of being fast attd of 
allowing' the . user great freedom of expression since any number of 
extraneous words, can. be included as long as the keywords ,^.ppear. A 
parti».'.^lar parser can also be changed easily (by adding new rules) until 
such time as the rules begin interacting, at which point it is unclear' 
whic^t' rule to use. When interactions do begin to occur, keywords can be 
assigned an "importance" number and the rule with the i.ighest number can be 
used. However, conflicts may still arise when different k?»ywords of equal 
importance appear in the same sentence. 

Keyword techniques work well in situations where the actions that the 
syste^ wishes to take in response to a sentence correspond in a simple way 
to the words (i.e. the concep.ts are not typically expressed as multiple 
word phrases, and words do not have multiple int'^^rp.'^etations)-. However, 



they are weak in situations in which concepts are complex ehougli to 
require- embedding or in which quantification(2) is- required, since their ^ 
semantic interpretation is essentially one level. In these cases, keyword 
patterns become more cumbersome 'and inefficient to use than more striictural 
techniques. For example, consider the sentence: 

(1) I think Q5 has an open emitter and a shorted base collector junction. 

■ ■" . *^ * ' " , , 

To recognize this ''sentence requires a very detailed keyword pattern which 
could be "keyed" equally Well, or equally poorly, off . any of the words: 
think, Q5, open^ emitter, shorted , base or collector. The main failing ''':bf 
the keyword technique is that it's incapable* of capturing any of the 
structure of the language it is trying to characterize. 

v PARRY 

PARRY is a ongoing project to develop a. dialogue system that simulates 
paranoid behavior (Colby 1973, Colby et al. 197^). The system must respond 
to any possible question and must "understand" the questions well enough to. 
exhibit paranoid behavio>. To these ends*, Colby has extended the keyword 

^parsing techniques introduced by W'eizenbaum by adding a second level of 
matching. After j a preprocessing phase collapses compound worlds, 
canonicalizes similar words, performs minor spelling correction and deletes 
unrecognized words, . the input .is segmented at certain keyw.5:r(i 

• boundaries. (3) Each segment is then matched against a collecf'.ion of 
segment patterns. The resulting list of recognized segments is theii 
matched to a collection of complex patterns. Patterns have reassembly 
rules associated with them that construct the response. 

Two important restrictions that should be placed on the application of 
keyword schemes to avoid mis-understandings (i.e. to avoid having patterns 
apply when they shouldn't) have arisen from Colby's work. One is that, at 

(2) Quantification refers to the problem of having a noun phrase that can 
range, over a set of values, e.g. "some cars have engines", " all cars have 
engines". One of the problems with quantification is determinXrig the scope 
of the quantification with respect to the rest of the sentence, especially 
when the rest of. the sentence contains, another quantifier. 

(3) The fragmentation technique (whicb~-is critical to proper operation) was 
developed by Wilks working in mactjl-rfe translation ( 1973a, 1973b). The list 
of segmentation words inpitfdes punctuation marks, sub iunctlves , 
conjunctions and prepositiooa^ 



most, one element should be ignored at each level of matching*- Segment 
matches should adcount for all but one word* Complex patterns should 
account.Ibr all 'but one segment* The other restriction is that patterns 
shoulid-- require that their elements occur in a particular order* The 
following example (from Colby et al* 1974) demonstrates the usefulness of 
ignoring words such: as "well" 'in sentence 3> and the importance of . word 
order; without word order restrictions, any pattern that matched 2 would 
also match 3* 

12) Are you well? ' | 

(S) Well, .are you? , . ' 

PARRY has /demonstrated the capability of dealing with: a relatively 
■ ■ . ' ■ ■ , . I 

large number of concepts at a shallow level* The power in PARRY 's approach 

lies in its ability .to tolerate unknown words* As mentioned, this 

fuzziness is implemented by allowing the deletion of single elements from 

both levels of, matching* Unfortunately the underlying semantics' of PARRY's 

task, indeed the goals of > the task itself, are vague, which makes 

attributes such as scope and habitability Hard to :(jev^llia,te* Furthermore,' 

the. two-level pattern matching' technique lacks the precision required in, a 

problem solving situation in which many '^-regularities cannot be captured by 

one-level embedding* 

.... V 

NLPQ • ^ - ' ^ ■ ^ ■ 

^ Heidorn (1972,1974,1975) developed an automatic programming system 
called NLPQ which allows users to describe simulation problems in English* 
The. system takes an English partial description of a problem and fits it 
into an internal description language, building pieces of the problem* 
From the partial internal description, questions are generated that request 
missing pieces of information* When the description is complete, the 
system can generate a GPSS program or an English description of the model 
it has built from the user's description* The user can also ask questions 
about the pr-esent model, and make changes and addition? to it* The English 
processing is done using augmented phrase structure rules* The phrase 
structure component is syntax-based. — it locks for things likp noun 
phrases — with semantic restrictions being carried along in features that 
arp^ tested in conditions on the phrase structure rules* The structure 



building augmentations create semantic/conceptual network . struct ure3, • 
called Segments, that represent the semantics of the phrase. Much of ' the 
system's success appears to be its. close match between the , structure of 
segments -and the way English is used, to. describe, modelling/problemSw. No - 
information on the use br:NLPQ by naive users has been published, so it : is 
difficult to evaluate the system's habitabiiii-H^ 

CQNSTauCT " . ' , . 

bONSTRUCT is a general .system to do ^natural language processing"^ 
developed at the Institute for Mathematical Studies in the Social Sciences . 
at Stanford .University (Smith-et al, 1974) • Its major application is in a 
text-based , question answering system for elementary mathematics (Smith, 
N.W^ 1974) • The system answers questions such as: 

t^ (4) Are there any even -prime number^j * that are greater than 2? 

V^'^ (5) Is the sum bf 5 and 2 less than the product pf 5 and 2 but greater 
than the difference of 5 and 2? 

The semantic basis of the system is a* collection of .procedures for 
generating ;:and manipulating sets and numbers,' .The semantics- of question 4 
would be "ar^'v*there any elements in the set created by intersecting the set 
of even .numbers, the set of pnime numbers and theset-of numbers greater 
than 2?" As all of the sets in the example are infinite, the procedures 
know about dealings with intensional' as well as extensional descriptions of 
sets, , • ' . : V 

Jhe meaning of a senteric6 Is--' determined by the following process. 
First a preprocess phase occurS" 'during which {Al abbreviations are- 
expanded, (2) synonyms are canonicalfzed ,' (3) comFk)und" word , and common 
phrases >are collapsed to a single word representation, (k) noise words are 
eliminated and (5) each word is replaced by* its lexical category;. The';, 
input . is, then parsed with a context-free gramra"ar with the semantic 
interpretation occurring in parallel via semantic construction functions 
associated with each grammar rule. Whereas this procedure is clearly 
inadequate if a traditional syntactic grammar is used — no reasonable- 
?3emantic function could be associated with the rule ^ := N^P VP — the 
CONSTRUCT grammar is built around the, semantic rules using categories that . 



capture concepts in th(B._ application domain* For example, the grammar 
contains the grammatical category SUBST which corresponds to the semantic 
.concept of a constructive set. This cuts across traditional^ category 
boundaries as seen in the sentences from (Smith et al* 197^): 

Is 2 a factor of 4? ■ 

How many ractors of 12 are even ? 

Give me /the factors of 12 that are between . T- and (y\ . 

The underlined portions would all be parsed into the SUBST category, 
although their traditional categories would be noun phrase, adjective, and 
prepositional phrase. ; ' 

RENDEZVOUS 

Codd (197^) is. designing a natural language system, called RENDEZVOUS, 
to support the needs of casual users of data bases. '.Ohe problem, that ' Godd 
has addressed, which has been neglected in. previous systems, is what action 
to take if a user 's query is . beyond the restricted language understood ; by * 
the system. A, central notion to ""Godd 's proposed ^sqiution to this problem 
is that of a "clarification dialogue" a systejn;^initikted dialogue that 
includes queries about an unacceptable^ utterance that attempt^ to^arrive -at 
the user's meaning. Godd points out that a clarification dialogue must, be 
embarked upon very ..carefully. For example.., if the .system' encounters the 
unknown word "concerning", one of the worst pos^^ible responses is "What do 
you mean by the word 'concerning'?" Almost, any 'response to such a question 
would be beyond the capabilities of the system. Any clarification dialogue 
must be of "bounded scope" and guided b^ those par?t$, of the query which the 
system- can understand. RENDEZVOUS also employs re-statement of a user's 
query to confirm the intent of the cjuery and to point out ambiguities. The 
range- of language accepted by RENDEZVOUS, indeed even the method used to 
extend the range, is unclear. -The aspect of RENDEZVOUS that Is oMnterest 
here is the extent to which it has been designed as a "friendly" system. 



1 



LUNAR ■ ^ 

The LUNAR system (Woods 1973a; Woods 



et aK 1972) is a 



natural 



language understanding , implementation that, ""combines a general semantic 



interpretation mechanism (Woods 1967 r 1968) with a large scale grammar of 
English (Woods 1970; Woods efal. 1972).' LUNAR was designed to allow a 
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lunar geologist to use . English to query the chemical analysis data 
collected- from the moon missions, Typic^rl questions the system answers 
are: ■ . " 

What is the average concentration of aluminium in high" alkali rocks? 
Which samplesVhave. greater than 20$.vrnbdal Plagioclase? 

... . I* ■ " 

The processing of. a query .occurs in three major phases, During'the 
, first ptiase, the syntactic component derives the "deep structure" of the 
sentence, (4)/ The syntactic component uses a general transformational 
grammar of English syntax expressed as an augmented transition network (see 
Chapter. 5). In the S^econd phase a<; general,, rule-driven' semantic 
interpretation procedure' produces the representation of the meaning of the 
sentence as a program in a formal retrieval language, (5) 'The semantic 
interpretation rules are tree-structured pattern-matching rules that are 
use&v in groups to . extract the meaning of different pieces of the syntax 
--^ree,- .The third* phase is the execution of the formal expression to produce 
the answer to the request. The formal query language is a gene'ralization 
of /.the predicate calculus that, ha^ been carefully designed to allow natural 
t^ranslation-:, fr-om Engl4.*sh, The strength of the , LUNAR system lies in its 
ijiechanisms to;deal with quantification, conjunction, and relative clauses, 
^nd these -are direct results of the carefully designed formal query 
/language , 

/' Discussion * . " ^ ' . • 

The notion of an augmented i phrase structure grammar pi'ovides a useful 
base for^ comparison between the§e systems, (6) An augmented phrase 
structure graraiziar contains two components. One is a set of cpntext-free 
phrase structure rules,, The other is a corresponding set of functions. 

This is the* linguistic deep structure hypothesized by ^Cl?bmsky (Chomsky 
1965) which has a central role, in'the theory of transformational grammar, 

(5) The notion that the meaning oC a sentence is a program.- is generally 
called "procedural semantics". Procedural semantics is in general use for 
question answering applications. It does not; however, constitute a 
complete theory' of meaning. In particular it does not account for such 
phenomena . as declaratives, uses of temporal references, and belief 
structures, . . . ■ ' 

(6) The idea .of associating additional ihformation with a phrase structure 
grammar has appeared ih various forms since earl\y compiling systems , (Irons 
7961 ) , . : > 
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sometimes arbitrary, sometimes rRstrictRd, .^y^menting each of . thp rules 
that can be used to block the application of the context-free rules and to 
maintain s'^ructures. While the paradigm of augmenting phrase structure 
grammars ,is followed by a large number .of natural language' systems, 
important ciiLf Cer^nces exist with respect to what type of information is 
encoded, in the grammar. For example, the. LUt^AR system^ ' uses a purely 
syntactic grammar (7) and uses the augments to perform syntactic operations 
such as subject-verb agreement ^nd to maintain the structure- of<.the 
syntactic tree. NLPQ uses a syntactic grammar restricted by usu*^lly 
semantic features and uses the augments to perform parallel semaotic 
interpretation. CONSTRUCT performs the semantic interpretation in parallel 
with a set of context-free rules that are semantically oriented.' PARRY 's 
patterns, if viewed as limited, phrase-structure grammar rules, are 
directly linked, to the semantics of the system. The" decision about how 
much semantic information to encode in the grammar is a trade-off > between 
efficiency and generality. ,Each of the systems presented, here represents a 
defensible position along this spectrum. ^ . 

When we began developing the SOPHIE'- 3ystem(8) we explored the 
possibility of using, intact, the syntactic component, of the LUNAR system. 
Since the LUNAR syntactic component was building a linguistically motivated 
description, as cipposed to the task oriented descriptions being built by 
"the other "systems, we felt its transferability to other domains would be 
.higll;; We found the grammar to be very adequate, parsing many of the most 
Tcbnj'plicated ; sentences, we felt SOPHIE weald ever need to understand. 

Unfortunately, on simplevsentences it provided more information about the 

' - ■ ■ ' ". >}- . . ' ■ . ■■ 

sentence, than we needed. For example, tense " information was seldom needed 

•*>■'... - ' 

and in 'those cases where needed, it could be extracted f rom ^ the 
re^lationships between concepts. The .quantification and relatjb;\?'e clause 
mechanisms were oriented towards Woods' formal query language which was not 

{7) The augmented transition n'^twork Is an extension oT a recursive 
transition network that has thp pow^r of a phrase structure grammar. For 
this reason we can classify it h^rp as using an augmented phrase structure 
grammar. We will argue later that the transition network has 'conceptual 
advantages over phrase structure rules, but this does not affect this 
discussion, which points out the difference in the kind of information 
captured in the grammar. 

(8) A SOPHistlcated Instructional Environment for tpachlnc; plpctronic 
troubleshootin'g. . Chapter 3 provides examples of SOPHIE 's language 
requirements. • 



natural for our use. The v-use of conjunction in our domain is 
straightforward and relatively ^predictable, unlike its use in the LUNAR 
domain. • All in. all we had the feeling of using a microscope when we only 
needed a magnifying glass! The underlying semantic structure. of our system 
could - not take advantage of such detail. . Added detail is acceptable (it 
can always be ignored^* except that the perception of such detail takes 
time, which Is a scarce commodity. The LUNAR systrem-'was taking 2 or 3*' 
seconds to syntactically parse a sentence and another 5 to semantically 
interpret it . This experience led us to-' explore ways in. which-, the 
semantics of .the system could' be used to speed the understanding process. 

The technique we developed (described in Chapter. M) has much in common 
with- 'both NLPQ and CONSTRUCT. However, significant differences arise from 
the emphasis we have, placed on dealing with dialogues, and ' on the 
construction of .a. friendly system. 'This has caused us to exploit two uses 
of semantics (during parsing) not found in these other systems. One is the 
insight' provided into the nature of ellipsis and deletion In dialogues. 
The other is the basis provided for characterizing a habitable language. 
In Chapter 4,. we shall discuss our concept of a semantic grammar, and how it 
allows exploitation of these two advantages. Before we get into the 
details of how .^thls is accomplished , we present in the next chapter an 
example of what has been accomplished. 




Chapter 3 
SAMPLE DIALOGUE 



Before delving into the structural aspects and technical details of 
the semantic grammar' technique , we would first like to provide? a concrete 
example of the dialogues it' has supported. This chapter presents an 
annotated dialogue"^' of a student using the "Intelligent" CAI system 
SOPHIE. (9) SOPHIE was: developed to explore ""the use of artificial 
intelligence techniques in providing tutorial feedback to students i^ngaged 
in problem solving activities. The particular proWem^solving activity 
thcit SOPHIE is concerned with is the troubleshooting of a malfunctioning 
piece "of electronic equipment. SOPHIE models the piece of equipment and 
answers >the student"'s requests for measurements and other information to 
aid, him in debugging the equipment. More important, throughout the problem 
solving session, SOPHIE can evaluate the logical, consistency , of a student's 
hypothesis or generate hypotheses which ar*e. con.si^tent ' with the behavior 
the student has thus far observed^ ( 10) In the dialogue, the student's 
typing iii. underlined. - Even though the dialogue deals with electronic, 
jargcn, -the linguistic issues it exemplifies occur" in all domains. The 
annotations (lower case, indented) attempt, to point out these probl,ems and 
should be understandable to the non-electronics oriented deader'. 
WELCOME TO SOPHIE - A 'SIMULATED ELECTRONICS LABORATORY. 

/ The circuit^ (Figure 3.1) is based . on the .Heathkitf IP-28 power supply. 
. The IP-28 is a" reasonably sophisticated power supply with both current 
limiting and voltage limiting behavior. These two inter'related 
feedback loops make troubleshooting this circuit non-trivial. ; 

>> INSERT A FAULT ^ ' , 

The student tells SOPHIE to give him- -j. fault which he can 
tr'oubleshoot . SOPHIE randomly selects a fault ,_ inserts it intp a model 
of . the. instrument and tells the student; how ihe front panel controls 
are presently set. 

THE INSTRUMENT HAS A FAULT AND ITS PRESENT CONTROL SETTINGS ARE: 
CC 1.0 CURRENT CONTROL-FROM 0.0 (MIN CURRENT) TO 1.0 

CR HIGH. CURRENT RANGE SWITCH, HIGH=1 AMP, L0W=.1 AMP 

LOAD -1000 LOAD RESISTANCE ' IN OHMS 

VC 1.0 "VOLTAGE CONTROL-FROM 0.0 (MIN VOLTAGE) TO 1.0, 

VR - HIGH VOLTAGE RANGE SWIT(^H, H;GH=30, L0W=10. VOLTS 



Vn The dialogue Is intended Zo demonstrate "SjFHih' s linguistic 
capabilities and. while it touches upon the major features of SOPHIE,, it is 
not meant to exhibit the logical or. deductive capabilities the syst?ni, 
M0> The reader* is encouraged to^se^ (Brown and Burton 1975) for further 
examples and descr^iptions of SOPHII::'s tutor*ial and Inf ^i*f-ntial 
capabilities. 
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Figure 3:1 , 
Circuit Schematic of Heathkit IP-29 Power Supply 



ERIC! 



> >WHAT IS TH E OUTPT? ' 

OUTPT =i>. OUTPUT 

THE OUTPUT VOLTAGE IS K2 VOLTf. 



SOPHIE corrects the student's tyoing rrdstake 
Notice that "the output" was disambiguated a 



guated as the output voltage. 

IF:Y0U are not sure whether, a value is correct,, you can ask "IS THAT 
VALUE CORRECT?" 

The system points out a'f^ature that the student may not otherwise be 
^^.ware of. '^Correct"' here means what it is in a working instrument* 

>> IS THAT._ RIGHT >- / ' > 

•IN A WORKING CIRCUIT THE OUTPUT VOLTAGE IS 30.1 VOLTS. 
THAT VALUE IS INCORRECT. . - 

"That" here refers to the' value of the previous measurement. SOPHIE, 
in addition to the faulted instrument, has a working circuit which can 
be used to t^ke reference measurements. 

>> WHAT IS THE VOLTAGE BETWE^ T ^CURRENT LIMITING TRANSISTOR 

. ,. AND THh! CONSTANT CUHHENT SDUtiCE? '■ '■ 

THE VOLTAGt; AT N4 IS 2.4B VOLTS. ^ ,^ - 

In addition to specific part names, SOPHIE recognizes references^ to 
parts by their section.: Notice that although the sentence .^ays 
"voltage between x and y"^ it really means "voltage at z with re.>oect 
to ground". The output routines rephrase the input in a simpler /orm 
(i.e. ..Between x and y reduces to N4 ; . 

>> WHAT -IS IT BETWEEN THERE AND. THE BASE OF 06 

THt; VOLTAGh; BtlTWEtlN Ni4 AND Nb IS 1.28 VOLTS. . 

HereJUrt" refers to "the voltage" and "there" refers to NM. 
Also notice that "between" has to be treated differently, 
i.e. two nodes are explicit, now. 

>> IS THERE ANY RIPPLE.-IN THE VOLTAGE AT THE BASE OF QM ? 
I M SOHHy, 1 DO KOT HANDLt; AC (JHAHACThlRlSTICS. 

SOPHIE recognizes ..that the student is moving into an area that it is 
unable to handle. This is currently done by keyword recognition (e.g. 
ripple) but points out a technique tnat should be incorporated in 
ether aspects of th6 , system as well. 

■> >WHAT 13 THE BASE CJ R HENT. CF Q-^ 

THK UASt; CUHREINT OF CB iii 12 MTCROAMPS. 

SOPHIE can answer questions about current, resistance, and power 
dissipation as well as voltage. The output and input routines know 
about unit prefixes. 

> >THRU- THE EMITTER? 

THt miTTEH CUHHHHT OF Q3 IS- -.26 MILLIAMPS. ^ . ■ 

This is . a type of sentence fragment that occurs frequently in 
dialogues that we will refer -to as an ellipsis. Rather than 
specifying a complete thought, an ellipsis only gives that piece of 
the thought which contracts an earlier thought. Notice that the 
referent cannot b** determined on syntactic grounds (as is possible 

.with the ellipsis "ln'=> emitter current": yet this is the sort of 
statement, a student might make if the last answer had caused -a slicht 

, shifting of his thinking. 

>> COULD THE PROBLEM BE THAT 05 IS BAD? 

PLtAiSt: BE MORE SPECIFIC ABO'J'r/HOW QS IS FAULTED. 



The student develops an Idea about what Is wrong and asks to I have It 
evaluated • 

»COULD IT BE SHORTED? . 

THE SH0RT£!15 MODES OF. A TRANSISTOR CURRENTLY IMPLEMENTED' ARE: ALL 
JUNCTIONS SHORTED OR ONE JUNCTION'- SHORTED - WITH THE OPPOSITE ' 
TERMINAL MQDELLED OPEN • 



hypothesis must be completely (or non-amblguously) speclfledl before 
valuation can take place • Even though the last statement generated 



A 

evc 

an error condition', It is still used to determine possible fcrohoun 
referents, 



SHOULD Q5 HAVE ALL JUNCTIONS SH0RTED7 

> NO : . ' 

WHICH TERMINAL SHOULD BE MODELLED OPEN? 

.> THE BASE . - 

OK,.X LL LET 05 HAVE ITS EMITTER COLLECTOR JUNCTION SHORTED AND ITS BASE OPEN 

At this point, SOPHIE looks back over the student's measurements to 
determine:. If he should have "known better" • See (Brown and Burton 
1975) or (Brown et al* 1975) f^or further description • 

HMMM, LET COMPUTE ON THAT FX)R A. MINUTE* 

When tne system knows there maybe a long wait, a "delaying" message Is = 
printed tf. let the student know .something Is happening. . 

WHEN THE SETTINGS WERE? . 

- CC=i.O., CRrHIGH, LOADrlOOO, VC=r*0, V=fl=HIGH, ^-^ 
THAT FAULT CONFLICTS WITH THE FOLLOWING .MEASUREMENT. > 
YOU OBSERVED THAT THE VOLTAGE BETWEEN U^ AND N5 WAS 1.28 VOLTS. 
IF Q5 HAD ITS EC SHORTED AND ITS BASE OPEN IT WOULD BE 1 . 1 MILLlVOLTS. 
IN A WORKING. CIRCUIT IT IS 1.43 VOLTS. 

BUT THAT FAULT DOES. EXPLAIN THE VALUES YOU OBSERVED FOR 
THE OUTPUT VOLTAGE, THE VOLTAGE AT N4, THE VOLTAGE BETWEEN NM AND N10, 
THE EMITTER CURRENT OF Q3 AND THE BASE CURRENT OF Q3. v\ 

>> WHAT COULD BE WRONG . : , \ ^ " 

The; student cannot go further and asks for assistance. . .SOPHIE 
. proposes some possible hypotheses for the sttident to explore. 

THERE ARE LOTS OF FAULTS WHICH WOULD EXPLAIN THE I^kASUREMENTS YOU 

HAVE TAKEN; HERE ARE SOME POSSIBILITIES: 

THE RESISTANCE OF R8 CHANGES ' 

THE LEAKAGE RESISTANCE OF C4 CHANGES - . 

THE BETA OF Q3 CHANGES 

R15 IS OPEN 

Q6 IS FAULTED - ,\ , * . 

»IF R8 WERE _^0K, WHAT WOULD THE OUTPUT VOLTAGE BE ? 

.IK THK VALUt; OV H8 Wk'HK. ^OOOTT ~ 

UNDER THE PRESENT SETTji^GS, THE OUTPUT VOLTAGE WOULD BEt..90M VOLTS. 
IN A WORKING CIRCUIT if IS 30-1 VOLTS. 

The student doesn't believe one of the possibilities so he checks It 
using a hypothetical statement. Even though the voltage In ttils case 
, and that In the case of a faulted circuit are not the same, -they 
qualitatively agree. ^ . 



»REPLACE Q6 



The student guesses maybe Q6 is bad and tries to replace it. When the 
student replaces a part SOPHIE queries hira to insist that he know how 
it is faulted. - . ' . 

I AM GOING TO ASK YOU SOME QUESTIONS ABOUT HOW Q6 IS FAULTED.^. 

ARE ANY JUNCTIONS SHORTED? -r- ■ - " 

> YES 

WHTEH JUNCTIONS ARE SHORTED? 

> ALL • s . - . ■ 

If the response to a query is not understood, further specification of 
the expected response "is given. . 

PLEASE SPECIFY THE JUNCTIONS AS 'EC EB AND CB (SEPARATED BY SPACES). 
WHICH JUNCTIONS ARE SHORTED? 

> EC EB AND CB 

Q6 IS NOT FAULTED IN THAT MANNER. 

The part would be replaced if apy of the junctions had been shorted. 
SOPHIE would normally have critiqued the student's hypothesized fault. 
This was not included in the dialogue to save space. 

Q6 KCT REPLACED. ^ . 

»VQLTAGe DROP ACROSS R5 . ' 

TMH VOLTAGE ACROSS R5 IS -.028 MILLIVOLTS. . 

Some noun phrase utterances are recognized as questions. 

> >WHAT ABOUT Rl? ■ ^■ 

THE VOLTAGE ACROSS R3 IS 71.6 .VOLTS,/ 

Another ellipsis. .The context of the previous question' is used to 
determine "what" about R3. - - - ; 

» REPLACE R^ ' 

IS OPEN Or shorted or does it have the wrong VJ^LUE? ^■ 

> JX. IS OPEN > 
R3 HHPLACED 

> >IS THERE ANYTHING ELSE WRONG 
NO ~. ^ 

After fixing the fault, the* student uses SOPHIE to explore other 
^aspects of tne- circuit's behavior. / ^ 

>> IF C? WERE LEAKY, WHAT WOULD THE VOLTAGE ACROSS IT BE? 

• - / - . . ' ■ 

The referent to. "it" in this caa^ occurs earlier in the same sentence. 

IS IT OKAY iF I USE 10000 FOR THE LEAKAGE RESISTANCE OF C2? ' 

> YES ' ' ' ■ 
imiE LEAKAGE RESISTANCE OF 02 WERE" 10000 

UNDER THE PRESENT SETTINGS, THE VOLTAGE ACROSS 02 WOULD BE 26.3 VOLTS. 
IN A WORKING CIRCUIT IT IS . 28 . 9^V0LTS. ^ 

»WHAT HAPPENS IF "Ct SHORTS? / 
UMDER THE SETTINGS: ' / 

^ ' CC=1.0, CRrHIGH, LOAD^IOOO, VC=1.0, VR=HIGH, 

IF CI WERE SHORTED. D3 WOULD BECOME OVERLOADED. 

•'^ Ir the case where nothing more interesting happens, "what happens" 
questiOTis default to' the output voltage. 

»IF Q6 HAS AN OPEN EMITTER AND A SHORTED BAE COLLECTOR JUNCTION 
.... WHAT HAPPENS TO TOE VOLTAGE BETWEEN ITS BASE AND THK JUNCTION OF 



/ 

/ 



^^.^ THE^jjgl^TAGE LIMITING SECTION AND THE VOIrTGSA REFERkNCE SOURCE? ' 

VOLTGEA => VOLTAGE - . ^ ' 

•Several oxanples of the .use of conjunction. Imagine the student's 
frustration if he is forced to type this ' query over because of 
spelling mistakes. 

•IF 06 HAD ITS BASE COLLECTOR JUNCTION SHORTED AND ITS EMITTER OPEN, 
UNDER THE PRESENT SETTINGS: . ' 

CCzLO, CRrHIGH, LOAD=1000c VCrl.O, VR=HIGH 
THE VOLTAGE BETWEEN^ N5 AND N13 WOULD BE THE SAME, I.E. 28.9 VOLTS. 

A hypothetical question is first tried under the present settings. If 
no difference is noted, other settings are tri-5d. 

LET ME TRY-^ANOTHER CASE WHICH USUALLY POINTS OUT« PROBLEMS IN THAT 
SECTION /OF THE CIRCUIT. 
UNDER .THE SETTINGS: 

/•■CC=.9, CRrLOW, LOAD=10, VC-UO, VR=LOW. 
THE VOLTAGE BETWEEN N5 AND Nl3 WOULD BE 8.94 VOLTS. 
IN A WORKING CIRCUIT IT IS 9.57 VOLTS. 

>> WHAT IS THE OUTPUT VOLTAGE WHEN THE VOLTAGE CONTROL IS SET TO .5? 

WITH THE VOLTAGE CONTROL NOW SET AT .5 

•THE* OUTPUT VOLTAGE IS 15.6 VOLTS. 

A hypothetical question can involve the, changing of a control. If 
this happens the control is left at the new setting and the student is 
told so. ' . 

>> IF IT IS SET TO- .7? . , 

WITH THE VOLVAGE CONTROL NOW SET AT .7 
THE OUTPUT VOLTAGE IS 21.4 VOLTS. 

Context determines which measurement should be taken and which control 
should be set. 

>>.Q? 

WITH THE VOLTAGE CONTROL NOW SET AT .9 
THE OUTPUT VOLTAGE IS 27 1 2 VOLTS* 

>>WH AT IS THE CURRciNT THRU THE CC WHEN THE VC IS SET TO 1.0 ? - 

H iTfrTTTE^- VOLTAGE CONTROL NOW SET AT 1.0 ~ : 

THE CUF.P.EI^T THRU THE CURRENT ' CONTROL 13 30 .MILLIAMPS. 

CC and VC are abbreviations. Notice that controls are also "parts" 
^-that can ^be used in. measurements. ■ . ^ 

> > WHAT IS IT WITH IT SET TO . 8? 

WITH THE VOLTAGE CONTKOL NOW ^T AT .8 

THE CURRENT THRU THE .CURRENT CONTROL IS 2^ MILLIAMPS. ^ 

Here the system must correctly instantiate two occurrences of "it"* 
The reTerent to the second "it" could be either CC or VC since these 
are both controls-; . 

>>GOODBYE * 

GOODBYE • 
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SEMANTIC GRAMMAR- 



INTRODUCTION- • 

In. Chapter 1 we described the requirements for a natural language 
processor in a learning environroeiit . Briefly, they are efficiency, and 
friendliness over the ' class of sentences that . arise in a dialogue 
situation. The major leverage points we have that allow us to satisfy 
theae requirements are' (1) limited domain, (2) limited activities within, 
that domain, and (3) known conceptualizations of the domain. In other 
wuras, we know the problem , area , the type of problem the student is trying 
to solve, and the way he should be thinking about the problem in order to 
solve it. ; What 'we are then faced with is taking advantage of these 
constraints in oraer to provide an effective communication channel. 

Notice- that all of these constraints relate to concepts underlying* the 

' ''■ • I 

student's activities. In SOPHIE, the concepts include voltage^ current," 
parts, transistors, terminals, faults, particular parts (e.g. R9 , 05, 
etc.), hypotheses, controls, settings of controls, and so on. The 
(dependency) relationships between concepts include things such as: 
voltage can be measured at lermii.iils , parts can be faulted, contrpls can be 
set. etc. The student,. in formulating a query or statement, is requesting 
information or stating belief about one of these relationships (e.g. 
"What is the voltage at the collector of 05*' or "I think R9 is open".) It 
occurred to us"" that the best way to characterize the statements used for 
this task was in terms of, the concepts themselves as opposed to the 
traditional syntactic structures. The language 'can be. described' by a set 
of grammar" rules that .characterize, for each concept or relationship,, all 
of the . ways of expressing it in. terms of -'"other constituent concepts. For 
example, the concept of a measurement requires a quantity' to be measured 
and something • against which to measure it. A measuremeiit is typically 
expressed by giving the quantity followed by a preposition, follov/ed by the 
thing that specifies where to measure (e.g. "voltage across C2", "current 
thru D1", "power dissipation of R9*' , etc.) These phrasings are captured in 



the grammar rule: (11) \i i 

<MEASUREMENT> : = . <MEi!SUREABLE/QUANTITi> <PREP)> <PART> ' ^ 

The concept of a measurement can, in turn, be uised as part of otlper 

concepts, e.g. to request a measurement "What is the voltage across C2?"; 

or to checlr a measurement' "Is the current thru D1 correct?". We call this 

type of grammar a "semantic grammar" because the relationships it trie^ to 

characterizj are semantic/conceptual as well as syntactic. I 

Semantic grammars have two advantages over traditional,: syntactic 

grammars. They allow semantic constraints to be. used. to make predictions 

during the parsinij process, and they provide a useful characterization of 

those sentences that the system should try to handle. The predictive 

aspect is important for >:four reasons: ^^s^l ) It reduces the number of 

alter^ia^tlves that must be checVed at given time; (2) it reduces the 

ambiint of syntactic (grammatical) ambiguity; (3) it allows recognition of 

ellipsed ^or deleted phrases; and (4) it permits the parser to skip words at 

controlled places in the input (i.e. it enables a reasonable specification, 

/of control). These points will be discussed in detail in a later section. 

"The . charactppization aspect is important for two reasons:.^ (1) It 

provides a handle on the problem of constructing a habitable sub-language. 

The system knows how to deal with a particular se,t of tasks over a 

particular set of objects. The sub-language can be ;partit,ioned by tasks to 

accept all straightforward 'ways of expressing those; tasks, but does not 

need to worry about others; (2) It allows a re^duction in the number of 

sentences that must be accepted by the language f^; while ' still maintaining 

habitability. There may be syntactic constructs/ that* are used frequently 

with one concept (task) but seldom with another. For example, relative 

clauses may be useful in explaining the reasons for performing, an 

experimental test but are an. awkward (though possible) way of requesting a* 

■ ■ . . ■ \ ." ■ 

measurement. By separating the processing along semantic grounds, o. ^ may 

gain efficiency by not having to accept' the awkward phrasing. 



(11) This is not actually a rule from the grammar but is merely intended -to 
be suggestive. 
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Reoreaentation of Meaning . , , ^ . - ^ 

Since natural language communication is the transmission of concepts 

'Via - phrases, "j^the "meaning" of a^ phrase is its correspondent in the 
conceptual space. The entities in SOPHIE'S conceptual space are objects, 
relationships between objects, and procedures for dealing with objects. 
The meaning of a phrase can be a simple d^ta object (e.g. "current limiting 

• transistor") or a complex data object (e.g. "C5 open", "Voltage at node 
1"). The ireaning of a question is a call to a procedural specialist . that 
knows how to determine the answer. The meaning of a command is a call to a 
procedure that performs the specified action. (12) For example, the 
procedural specialist DOFAULT knows how to fault the circuit 'and is used to 
represent the meaning of commands to fault the circuit- (e.g. "Open.,R9", 

."Suppose C2 shorts and R9 opens"). The argument that DOFAULT needs in 
order to perform its task is an instance of ..the ^concept of faults ,.that 
specifies the particular .changes to be made, e.g. "R9 being open". These 
samie concepts of particular faults also serve as ' arguments to two other 
specialists: HYPTEST-- which - determines the consistency^ of a fault with 
respect to^^e'^esent context, e.g. -"Could R9 be open"; and SEEFAULT- 

. whieifchecks the actual status of the circuit, e.g. "Is R9 open?"!. 

Result of the Parsing t- 

Basing the grammar on conceptual entities allows the semantic 
interpretation (the determinattion of the. concept underlying a phrase)' to 
■ proceed . in parallel with the parsing. Since each of the non-terminal 
categories in the grammar is based on a semantic unit, eacp grammar rule*' 
can specify the semantic description of a phrase' that it recognizes in much 
the. same way that a syntactic grammar specifies <a syntactic description. 

construction portion of the rules is procedural. Each rule has the 
freedom* to "decide how' the semantic djgscriptions , returned by the 
constituent items of that rule, are to put together to form the correct 
'"meaning" . 

\^2) Declarative statements- are treated as requests because the pragmatics 
of th-e situation imply that the student is asking for verification of his 
statement. For example, "I think C2 is shorted". is taken to be a request 
to have the hypothesis *^G2-^xs~ shorted" critiqued. / 



'For example, the meaning of the phrase "Q5" is the data .base object 
Q5. The meaning of the phrase- "the collector qf Q5" is (COLLECTOR Q5) 
where COLLECTOR is a function that -returns the data base Item that is the 
collector of the given traQsistor. * For a more complicated example, 
consider the non-terminal <MEASUREMENT> shown in Figure 4.1 • 

Figure H.1 ^ 
A Semantic Grammar Ruled 3) 



<MEASUREMENT> := output <MEAS/QUANT> [of <TRANSFORMER>] .1 
<TBAilSFORMSR)^ <MEAS/QUANT> * I 
- <MEAS/QUAHT> "^between <NODE> ?nd <NODE> I 

<MEAS/QUANT> <PREP> <PART> I 
<MEAS/QUANT> between output terminals ! 
<MEAS/QUANT> <PREP> <JUNCTION> ! 
<MEAS/QUANT> <PREP> <NODE> I 
<J.UNCTION/TYPE> <MEAS/QUANT> 

of <TRANSISTORySPEC>* I 
<TRANSISTOR/t£RM/TYPE> -^MEAS/QUXHTy 

ofr<TRANSISTOR> 



The goal for this non-terminal is to capture all of the ways that a student 
can specify a measurement (voltage^across D3, output current, etc.)* To . 
specify a measurement, there must be a qua'htity to be measured '<MEAS/QUANT> 
(e.g. voltage, ..current, resistance, power dissipation) , ^and something to ^ 
measure (e.g. with respect to a part, <PART/SPEC>; a transistor Junction, 
<JUNCTIGN>; or possibly a point in the circuit, <NODE>). The rule for 
<MEASUREMENT> expresses all of the ways that the studjent can give a 
measurable quantity and also supply its^ required arguments. The structure 
.which results from <MEASUREMENT>.,is a function call to the. function MEASURE 
which supplies the quantity being measured and other "•arguments - specifying 
where to measure it. Thus the meaning of the phrase ^the voltage at the . 

collector of Q5" is (MEASURE VOLTAGE ( COLLECTOR Q5 ) ) which was generated 

♦* . • ■ . . . / 

from the control structure: ^ ■ ■ i . 



{^3) The rule is expressed in a BNF-like notation which is an abstraction 
of the actual rule (see next ' section). Non-terminals are in capital 
letters and enclosed in angle brackets*. TerminaL/ are in lower, case. 
Brackets enclose optional elements. Alterna.tive right h^hd sides are 

" ■■ ■ ■ ■ / ' 
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separated by a "!". " ~ 



measurement 

• / \ 

meas/quant node 

■1 ■ I ■ 

voltage terminal 



terminal/type parl: 

1 i 

collector Q5 



A careful examiriation of Figure reveals that <MEASUREMENT> also 
accepts ^'meaningless" phrases such as "the power dissipation of Node 4," 
In addition, it accepts some meaningful phrases such as ."the resistance 
between Node -3 and Node 14" which SOPHIE does not calculate... This results 
from generalizing together concepts which are not treated identically in 
the 'surface Structure . In this case, voltage, current, ' resistance and 
power dissipation. werq- generalized to the concept of a measurable quantity.' 
Allowing ttte grammar to acp^pt more statements and having the 
argument-checking . done by the procedural specialists has the advantage of 
allowing the semantic routines to- provide the feedback as to why a sentence 
cannot be interpreted or "understoo»d" . -It also keeps the grammar from. 
•>being cluttered-v with special rules . for blocking meaningless 'phrases. 
Carried to the ]±mit, the generalization strategy would return the- grammar 
to being "syntactic" again (e.g. all data objects are "noun phrases"). The 
trick is to deave semantics in the grammar when it is beneficial — to stop 
extraneous ^J^arsings early , or tighten the: range ^ of a; referent for an 
ellipsis or deletion.. This is obviously a task-specific trade-of f . ( 14) 



(14) Bobrow and Brown (1575) describe an interesting paradigm from which to 
consider this trade-off. ■ 



The rel^ttonship between "^'a phrase and its. meaning is usually 
straightforward. However, it is not iimited.to simple embedding* Consider 
the phrases "the base emitter of Q5 shorted" and "the base of Q5 shorted to 
.the emitter"* The* thing which is "shorted" in both of these phrases is the 
"base^/ emitter junction of. The rule that recognizes both of these 

phrases, <PAiRT/FAULT/SPEG> r- can handle the first phrase by, invoking its 
constituent concepts of <JUNCTION> (base emitter of .Q5 ) and <FAULT/TYPE> 
(shorted) and combine the,ir' results* In the second phrase, however, it 
must construct the proper junction from the separate occurrences of the two 
terminals involved* ^ Figure ^*2 gives the. rules used to recognize these two 
situations* The situations- are . 'distinguished by the occurrence of the 
optional constituent in the second' phrase * (As will be discussed, lat^rp 
the rules are procedurally encoded, which provides a. natural way -of 
building se^^rate semantic forms for the^ two cases*) * Notice that "'the' 
parser does some -paraphrcwSirig , as the "meaning" of the two phrases is the 
same . ' ' | ' . ■ 

r Figure 4*2 

; ' . ' ' Grammar Rules . " ' . 

<PART/FAULT/SPEC> := <FAULTABLE/THING> is <FAULT/TYPE> 
, ;. A . [to <TpAKSISTOR/TERMINAL/TYPE>] ; 

<FA'ULTABLE/THING>. := <JUNCTl/oN> ! <TERMINAL> ! <PART> " / 
■<FAULT/tYPE> := open j -short-ed ■ ■ 

<TRANSIST0R/TERM1NAL/TYPE> /:.= .^ 1 emitter I collector ■ .. 

V v.This discussion has b^en presented^ as if the concep.ts were defined a 

priori' by the capabilities of the system* .^ Actually, for the system to 

remain at all habitable, the concepts are diocoyered .in the .interplay 

between the statements that arie made in the domain and' the capabilitie; of 

the system* When a particular English construct Is difficult to handle, it 

is probably an Indication th^t the concept it is trying to express has not 

been Vecognized properly by the system*. In our example "the base of 05- is 

shorted, to the emitter", the relationship between ;the phrase and its 

meaning is awkward because the" present concept of shorting requires a part 

or a junction* The example is getting at a concept of shorting, in which 

/ ■ . ■■,'*■. 

any "two terminals can be shorted together (e*g* "the poisitive terminal of 

i ■ ■ ' •• •• . . 

R9.is shorted to the anode of D6").* This is a viable conceptual ' view of 



"shorting", but its^ irapliementation requires allowing arbitrary changes in 
the topology of the circuit which is, beyond the effiiciency 'limitations of 
SOPHIE'S simulator. Thus, the system we were working with led us to define 
the concept in too -limited a Way. 

USE OF SEMANTIC INFORMATION -DURING PARSING ' ' 
Predictiog . . 

Having described the notion of a semantic graiaiaar, we will now 
describe the ways it allows semantic ihformation to be used in the 
understanding process. One use of semantic grammars ir» co predict ^th^c 
possible alternatives that must be chep]ke.d at a given point. Consider, for 
example, the phrase "the voltage at xxx"*, After the word "at" is reached 
in the top-down, left-to-right parse, the grammar rule corresponding to the 
concept "measurement" can 'predict very specif ical-ly the conceptual nature 
of "xxx": it. must be a phrase that directly or indirectly specifies a 
location in the circuit. For example , "xxx" could be "th.6 junctions of the 
current limiting section and the voltage reference, source" but cannot be '*3 
ohms" . ' . 

Semantic grammars; also have the effect of reducing thie amount of 
grammatical ambiguity. In the phrase "the voltage at ::xx", th<, 
prepositional phrase "at xxx" will be associated with the n,oun Vvoltage"^ 
without considering any alternative parses that associates it someplace 
higher in the tree. 

Predictive information is also used to aid in thie , determination of 
r.$ferents .'for , pronouns . If the above phrase were "the voltage at it", the 
grammar would be able to restrict the class of possible referents to 
.locations. By taking advantage of the available sentence contexts to 
predict the semantic • class of possible referents, thjB referent 
determination process is gr eatly' simplified . For example: 

■i ' (la) Set the voltage cont-rol to .3? 
••^-■(Ib) What is the current thru R9? 
; (;ic; What is it with, it set to .9? . 

In (1c), the grammar is able to recognize that the first "it" refers to a' 
measurement that the student would like re-taken under slightly different 
conditions. ■ The grammar- can also decide that the second "it" refers to 



^either a potentiometer or to the load resistance (i.e. one of those things, 
which can be set). The referent for the first "it" is the measurement 
taken in (lb), "the current thru R9" . The referent for the second "it" is* 

• ; . >■• .... 

"the voltage control" which is an instance of a potentiometer. The- context^ 
mechanism that selects the referents Will be discussed later. 

Simple Deletion * . 

The semantic grammar is also used to recognize simple deletions. The 
grammar rule for each conceptual ^ntity knows the nature of that entity's 
constituent concepts. Vfhen a rule- cannot\f ind a constituent concept, it 
can either: ' ♦ ""J^,''' v . 

a) fai.l (if toe missing concept is considered to be obligatory in the 
'surface structure representation) or, 

b) hypothesize that a deletion .has occurred and continue. 

For example, ' the concept of a TERMIIJAL has as one of its realizations the 
cortstituen.t concepts of a TERMINAL-TYPE and a PART. When its grammar rule 
finds only the phrase "the collector"; it uses this information to posit, 
that a part has been deleted (i.e. TERMINAL-TYPE gets Instantiated to "the 
collector" but nothing gets instantiated to PART) . v The- natural- language 
processor then uses the dependencies ..between the constituent concepts to 
determine that the; deleted PART must be a TRANSISTOR., The "meaning" of 
this phras.e Is then "the collector of some transistor*'*. Which transistor 
is determined when the meaning is evaluated in the present dialogue 
context. In particular, the semantic form reEurned is the 'function PREF 
^nd th*e classes of possible referents-; in our example the"? -form would be 
(COLLECTOR . (PREF '( TRANSISTOR) )).( .1 5) The . operation of PREF will be 
discussed later. 

Ellipsis 

. Another use of the semantic grammar allows. the processor to recognize 
elliptic utterances. These are utterances that do not express complete 
thoughts — a completely specified question or command but only give 

(15) The language LISP will be used in examples throughout" this thesis. In 
LISP, a function call is expressed, in CamoridKe-PolIsh notation: as a 
parenthesized list of the function name followea by its arguments. 
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differences between the intignded thought and an earlier one*(l6) For 
example, 2b, 2c an^^ 2d are elliptic utt;e.rances . 

(2a) What is the voltage ait Node 5? 
' (2b) At Node 1? ■ 

{2c) and Node 2? > ,. ■ ' 0 

(2d) What about between nodes 7 and 8? 

Ellipses can begin with introductory phrases such as "and" in 2c or "what 

about" in 2d; however this is not required as can be seen in 2b* Part of 

the ellipsis rule , is given" in Figure 4.3, • 

■ . Figure^ 4.3 . -l 

" ' Ellipsis Rule ' 

<ELLIPSIS> [<ELLIPSIS/INTR0DUCER>1 <REQUEST/PIECE> I • '* 
■ \ „ . L<ELLIPSIS/INTRODUCER>] if <PART/FAULT/SPEC> 

<REQUEST/PIECE/ r<PREP>J <NODE> ! ' ^ 

. i<PREP>J <PART> ! 
between <NODE> and .<NODE> ! 
• L<PREP>] <JUNCTION> ! 
etc . 

The grammar rule identifies which concept or class ...of concepts are possible 
from the context available in the elliptic utterance* 

- While the parser is usually a,ble to determine the intended concepts 
from the context available in ari elliptic utterance, this. is not always the 
case. Consider the following two sequences of statements. 

(3a) What 'is the 'voltage at Node 5? 
; • (3b)- 10? ^ . . . 

(4a) Whac is the output voltage if the load is 100? 

(4b) 10? - - - - • 

In iSb) i "10" refers to node 10, while in (4b; i.t refers to a load of 10. 
The problem this presents to the parser is that *the concepts underlying 
these two .elliptic utterances have nothing in common except their surface 
cealizations The parser, which operates from conceptual entities, does not 
.have a concept that includes bcth of these interpretations. One 'tjolution 
would be to have the parser find all parses (concepts) and then choose 
between them on the basis of context. Unfortunately, this would* mean that 
t^ime is wasted looking for more than one parse for the large percentage of 
sent.ences in which it is not necessary to dc so. A better solution would 

(16) The standard use of the word "ellipsis" refers to any deletion. 
FTather than invent a new word, we shall use the restricted meaning here. 
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be to allow structure auong the concepts, ao that the parser Would 
recognize "10" as a member of the concept "number" . Then the routines that ^ 
. find th.9 referent would know that numbers can be either node numbers or 
, values. This type of recognition could profitably be performed by a 
bojbtom-up approach to parsing. However, its advantages aver the present 
, scheme are not enough :to Justify the expense incurred by a bottom-up parse 
to find all possible well-formefd constituents. . -At present , the parser 
assumes one interpretation, and a messase is printed to the student 
indicating the assumed "interpretation. If it is wrong, the student must 
supply more context In his request. In fact, "10?" is taken as a load 
specification and if the student meant the node he would have to use "at 
10", MN10" or "Node 10", Later we will discuss the mechanism that 
determines to which complete thought an ellipsis refers. ' 

I \ ' . ' ' ■ . • 

USING CONTEXT TQ DETERMINE I^EFERENTS 
Pronouns and Deletions • 

Once the parser.has determined the existence and class (or set of 
classes) of. a pronoun or deleted object, the context, mechanism is invoked 
to determine the proper referent. This mechanism has a history of student: 
interactions during . the current session which contains, for each 
•^interaction, the parse (meaning) of the stud'ent's statement and the 
response calculated by the system. This list provides the range of 
possible referents and is searched in reverse order to find an object , of 
the proper semantic class (or one of the proper classes). To aid in the 
search, the context mechanism knows how each of the procedural specialists 
appearing in a parse uses its arguments. For example, the specialist 
MEASU-RE has a first argument that, must be a quantity -and a second argument 
that musf be a part, a Junction, a section, a terminal or a node. Thus 
when the^context mechanism is looking for a referent that can either be a 
PART or a JUSCTION, it will look at the second argument of a call to 
MEASURE but not the first. Using the information about the specialists, 
the ^context mechanism looks in the 'present parse and then in the next most 
recent parse, etc, until an object from one of the specified classes is 
found, 
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The, significance of using the specialist to filter the search instead 
of just keeping a list of . previously mentioned objeqts is that- it avoids 
mis-interpretations due to object-^ooncept ambiguity. As an example, 

consider the following sequence from the vsample dialogue , in Chapter 3: 

. ■ ■ ■ t > . •» ■ , 

(5) What is the current thru the CC when the VC is 1.0? 

(6) What is it when it is .8? . . 

S.entence (5) will be. recognized by the following rules from the semantic 
grammar: " .. ^ 

i/ ■ ■ ■. ' 

$1) <'REQUESt> := <SIMPLE/REQUEST> when <SETTING/CHANGE> ^ 

S2) <SIMPLE7REQlfEST> := what is <MEASUREMENT> 

j;3) <MEASUREMENT> := <MEAS/QUANT> <PREP> <PART> ; 

V>^) <SETTING/CHAKGE>- := <CGNTROL> is <CONTROL/VALUEy i'.-^ . 

:;5) <CONTROL> := VC ■ , . • - • 

with a resulting semantic form of: 

(RESETCONTROL (STQ VC 1.0) ^ 
(MEASURE CURRENT CC) ) 

RESETCONTROL i3 a function whose first argument, specifies a change to 
one of the controls and vhose second argument consist^/ \0f a' form to be 
evaluated in the resul^-.ing instrument context, STQ 'is used to^ change the 
setting or the one of the controls. The first argument to MEASURE gives the 
■quantity to be. measured* The second specifies where.it is to be measured. 
To recognize sentence (6),* the application. ^of rules:.$2 and $5 are -changed. 
There is- an alternative rule, for <SIMPLE/REQUEST> that looks fop, those 
anaphora that refer to a measurement. Theft'3 phrases, such as "it",' "that 
renult" or "the value", are recognized by the nqn-terminal 
<MEASUREMENT/PRONOUN>. The alternative to $'2 that would be used to par>e 

(6) is; • ' ^ 

" ' <SIMPLE/RBQUEST> .":= what is <MEASUREMENT/PRONOUN> ' 
The semantics of <MEASUREMS:NT/PRONOUN> indicate that an entire -measurement 
has been deleted. The alternative to rule $5: ' 
<CONTROL> := it 

recognizes "it" as an acceptable way to, specify a control. The resulting 
semantic form for sentence (6) is: \ 

'(RESETCONTROL ( STQ (PREF '(CONTROL)) .8) " . . . ,, 

^(PREF '(MEASUREMENT))) ■ 



Th^ function PREF searches back through thei context of ' previous semantic 
forms to find the most reo.ent mention of a member one of the classes* In 
the above exa'inple, it vill find the control VC but^not CC because the 
character imposed on the arguments of MEASURE is that. of a "part" not a 
"control" .^vl 7) The presently recognized classes for deletions are PART, 
-TRANSISTOR, FAULT, CONTROL, POT, SWITCH, DIODE, MEASUREMENT and QUANTITY- 
(The* members of . the classes are derived from the semantic network 
associated with a circuit.,) ' 

' • • ' . ' . ■ -C ' ■ ' ' . ' 

Referents for ElliMea - . V . 

If the problem bf^^pronouh/: resolution is looked upon as finding a 

previously mentioned '^bjectJTor, a^ currently specified use, then the problem 

of ellipsis can be thought of as finding a previously mentioned use" for a 

currently specified object. • For example: ;> 

(7) What is -the base current of Qkl v./. 

(6) In Q5? r ^ -V ^ 

.The given object is "Q5", and the earlier function' is "base current". For 
a given elliptic phrase, the semantic grammar identifies the concept/ Cor 
class of. concepts) involved. - In (7),^; since"" Q5 is recognized by the 
noin-terminal <TRANSISTOR/SPEC> , .thec= class wb,uld be TRANSISTOR. The context 
mechanism then searches ^f6r.;'a^'specialist in a previous parse that accepted 
the given : class as an ';arg6ment. When one is 'found, ^^he new phrase is 
placed in the proper argument position and the modified parse is used as 
the meaning 6f the ellipsis. 

Limitations to, tHe' Context Mechanism 

' The method of semantic classificafeipn (to determine reference) is very 
efficient .r.nd works well over our domain. It definitely does not solve ^li 
the problems of reference. Charniak has pointed out the substantial 

C17)The character imposition as described is too strong. For example: ~ 
What are the specs of Q5? 
$2) What is the voltage at its emitter? 
The character imposed on Q5 in $1 is that of a part which means that the 
..context mechanism invoked by '^$2*which is looking for a transistor won^-t 
findvit. This example is handled by relaxing the restrictions the 
procedural • SDecialist in $r put^ on its argument (i.e. It can be either a 
PiiRT or a TRANSISTOR). In ^^lit^e of this weakness in the argument, 
limitation approach, we have found It to be a useful means of reducing the 
search .time and avoiding some obvious mis-interpretations. 



problems of reference in a domain as seemingly simple as children'*s stories' 

(1972). : One of his examples demonstrates how much world knowledge may be 

required to 'determine a referent (1972 p. 7). 

Janet .and Penny went to the store to get presents for Jack. Janet 
said "I will get Jack, a top" "Don't get Jack a top" said - Penny.. ''He 
'has a top. He. will make you take it back." 

Charniak argues that to understand to which of the two tops "it" 
refers, requires knowing about presents, stores and what they will take 
back, etc. Even in domains- where it may be possijle to capture all of the 
necessary knowledge, classification may still lead to ambiguities. For 
example, consider the following: 

. (9) What is the voltage at Node 5 if the load is 100? 

(10) Node 6? ' 

(11) 7? 

In statement (11) the user means Node 7. In statement (10), he has 
reinf dreed, the use of ellipsis as referring to node number. (For example, 
leaving out statment (10), sentence (11) is much mbre awkward.) On the 
other hand, if statement (11) had been "1000" or if statement (10) had been 
"10?", things would be more problematic.- When statement (11) is "1000",'^we 
can infer that he means a load of 1000 because there is no node 1000. If 
statement' (10) had been "10?", there would be genuine ambiguity slightly 
favoring the interpretation as a load because that was the last . number 
mentioned. The major limitation of the current technique, wh-'ch must be 
overcome in ordar to tackle significantly more complicated domains, is its 
inability to return more than one ..possible referent. It considers each one 
individually until it finds one which is satisfactory. The ^nount of work 
involved in employing a technique which allows comparing referents has not 
been justified by our experience. 

REL; -ONSHIP TO OTHER SEMANTIC SYSTEMS 

The relationship between semantic grammars and purely semantic 
systems (Quillian 1969; Schank et al . 1975) and to some extent Wilks 
(197-:'a, 1973b) parallels the distinction between procedural and declarative 
knowledge. The relationsj^ip ,that exists between nodes in the semantic 

network structure contains little or no information about how these 

/.. . •. 

relationships might be expressed in language. An interpretation mechanism 



roust decide where the information is useful , While ttis is, in some sense, 
more general (the same information can be used for several purposes given 
the proper interpreter), it is necessarily less efficient. (Wilks has 
extracted some expressive, information , primarily concept order, into his 
templates.) A semantic grammar, on the other hand', is written for the 
process of recognizing concepts as they are expressed in the surface 
structures. 

FUZZINESS ^' ■ 

«• ' ■ • 

Having the ^grammar centered around semantic categories allows the 
parser to be sloppy about the actual words it finds in the statement. 
Having a concept in mind, and being willing to ignore words to find it, is 
the' essence of keyword parsing schemes. It is effective in those cases 
where the words that have been skipped are either redundant, or specify 
gradations of an idea that are not .distinguished by the system. For 
example, .in the sentence: "Insert a very. hard fault", "very" would be 
ignored; this. is effective because the system does not have any further 
structure over the class of hard faults. In the sentence: "What is the 
-voltage across resistor R87" resistor can be ignored because it is implied 
by "R8".(18) / . . 

One advantage that a procedur^al encoding of thf grammar (discussed, 
later) has over pattern matching schemes in the implementation of fuzz.i.ness 
is its ability to control exactly where words can be ignored. ■ This 
-provides the ability to blend pattern matching parsing of those concepts 
that are amenable to it with the structural parsing required by more 
complex concepts. The ^amount of fuzziness — how many, if any, words in a 
row can be ignored — is controlled in two. ways. First, whenever a grammar 
rule is invoked, the calling rule has the option of limiting the number. of 
words that can be skipped. Second. «ach rule can decide which of its 
constituent pieces or words are required and how tightly controlled the 
search for them should be. In SOPHIE, the normal mode of operation of the 
parser is tight in. the beginning of a sentence, but fuzzier after it has 
made sense out of something. 

(Iti) The first of these examples could be handled by making "very" a noise 
word (i.e. deleting it from all sentences). Resistor however is not a 
noise word in all cases (e.g. "What is the current through the current 
sensing resistor?")- and hence, cannot be deleted. 
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Fuzziaess has two other advantages worth mentioning briefly'.. It 
reduces the size, of the dictionary bec.ause all' known noise words don't have 
to be included. In those cases where the skipped words are meaningful, the 
misunderstanding may provide some clues to. the user which allow him to 
restate his query. - 

PREPROCESSING 

Before a statement, is parsed, a preprocessor ..performs three 
operations. The first expands abbreviations, deletes known noise words, 
and oanoni'calizes siP'ilar words to a common form. The second is a cursory 
spelling correction. The third is a reduction of compound words. 

Spelling correction is attempted on any word of the input string that 
the system does not recognize. The spelling correction algorithmC 19 ). takes 
the possibly misspelled word, and a list of correctly spelled words, and 
determines which, if any, of the cora;;*ect words is close to the misspelled 
word (using a metric determined by number of transpositions, doubled 
letters, dropped letters, etc.). During the initial preprocessing, the 
list of correct words is- very 3mall (approximately a dozen) and is limited 
to very commonly misspellecf' words and/or words that are critical to the 
understanding of a sentence. The .list is kept small so. that the time spent 
attempting spelling correction, prior to ^ attempting a parse, is kept to »a 
minimum. Remember that the parser has the ability to ignore words in the 
input string so we do not want to spend a lot of time correcting a word 
that won't be needed in understanding the statement. But ^ notice that 
certain words can be critical to the correct understanding of a statement. 
For example, suppose that the phrase "the base emitter current of Q3" was 
incorrectly typed as "the bse emitter current of Q3". If "bise" were^ not 
recognized as being "base" the parser would ignore it and (mis-)understand 
the. phrase as "the . emitter current of Q3", a perfectly acceptable but much 
different concept . (20) Because of this problem, words like "base", which 
if ignored have "been found to lead to misunderstandings, are considered 
critical and their spelling is corrected before any parse is attempted. 

■ (19) ii'^e spelling cc^reoMon r-outines are-* provided by INTERLISP and were 
developed by Teiteliiian i'or use in the DWIM facility (Teitelman 1969,197^). 
(20) To minimize the oonseauences . of such misinterpretation, the system 
always responds with an answer that indicates what question it is 
answering, rather than just giving the numeric answer. 
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Note that there are a lot of wbrds ~ "capacitor", "replace", "open", for' 
example. — that if misspelled would prevent the parser from making aense ipf 
the statement, but would not le,ad to any mis-understandings. These words 
therefore are not considered to be critical ,. and w6uld be cor5?ected in the 
second attempt at spelling- correction that is done after a statement fails 
to parse. 

^' •.. . ■ ' ■ . ■ % 

Compciund words are single concepts that appear in the sur,face 

structure as a fixed serie^s of more than one word. Their reduction is very 

important to the efficient^ operation of the parser./ For: example, in ' the 

question "what is the vWtage range switch . setiing?'% Vy.oltage;.r^^^^ 

switch" is rewritten as the ^single item "VR". If not rewritten, "voltage" 

would be mistaken as the beginning of a. measurement (as in "what is the 

voltage at N4") and an attemjs^l would h:ive to be made to parse "range switch 

setting" as a place to measure voltage. Of course after . this failed, the 

correct parse can still be found, but' reducing compound words helps to 

avoid backtracking. In addition, the reduction of. compound words 

simplifies the grammar rules by. allowing them, to work- with larger 

conceptual units. In this sense, the .preprocessing can be viewed as a 

preliminary bottom-up parse that recognizes local, multi-word concepts. 

IMPLEMENTATION \ * . 

Once the dependencies between semantic concepts have been expressed in 
the BNF form, each rule in ^ the grammar is encoded' (by hand) as a LISP 
procedure. This encoding process imparts to" the grammar a top-down control 
structure, specifies the order of application of the various alternatives 
of each, rule, and. defines the process of pattern matching each rule. The 
resulting collection of LISP functions, constitutes a .goal-oriented parser 
in a fashion similar .to SHRDLU - • (Winograd 1973), but -without the 
backtracking ability of PROGRAMMAJR. - . 

As has been argued elsewhere (Woods 1970; Winograd 1973), ^nooding the 
grammars as procedures — . including the notion of process in tir.- grammar -- 
has advantages over using traditional phrase structure graoiSiar 
representations. Four of these . advantages are: 

1) the ability to co.Mapse comon parts of a grammar rule . while still 
maintaining the perspicuity of the grammar. 
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2) the ability to collapse similar rules by .passing arguments , (as * with 
SENDR) • \ i - 

3) the ease" of '^interfacing other types of knowledge (in SOPHIE, primarily,.^ 
the semantic network) into the parsihg process* ' . 

the ability to build and save arbitrary structures during the parsing 
process* 

(21) . . 

In addition to the advantages it shareg with other procedural 
'representations*, the HjSP. encoding has the computational advantage of' being, 
compilable directly into . efficient machine code. The LISP implementation 
is efficient because the notion of process it contains (one process doing, 
recursive descent) is close to that supported by physical machines, while 
those ^of ATN and. PROGRAMMARvare non-deterministic and hence not directly 
translatable into ' present architecture. See (Burton 1975) for a 
description of how it is possible to minimize this mismatch.) Appendix B 
describes the. details of the LISP implementation and provides an example of 
a rule from the- grammar . - 

In terms of efficiency, the LISP implementation of the- semantic 
grammar succeeds admirably. The grammar written in INTERLISP (Teitelman' 
197^) can be block .compiled*. Using this technique, the complete parser 
takes about 5K of- storage and parses a typical student statement consisting 
of 8 to 12 words in around 150 milliseconds! Appendix C presents parses 
and timings of some of the sentences used in the dialogue. 



\ 



(21) This ability is sometimes' providea by allowing augments on pTirase 
structure rules. . . 
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Chapter 5 

A. NEW FORMALISM — SEMANTIC AUGMENTED TRANSITION NETWORKS 

Using the techniques described in Chapter a. natural language 

front-end'^ - capable of supporting the dialogue presented' in Chapter 3', and. 
requiring less than 200 milliseconds cpu . time per question, was 
constructed. In addition ; these ^same: techniques were used to. build a 
front-end for NLS-SCHOLAR (Grignetti et al . .197^; Grignetti' et al. 1975) '• 
(built by C. Hausmann) and an interface to an experimental laboratory for; 
exploring mathematics using attribute blocks (Brown et al. 1976). In the 
construction of these varying- systems,- the notion of semantic grammar 
proved to be useful. The LISP implementation, however, was found to be a 
bit unwieldy. While expressing the grammar. as programs has benefits in the 
area of efficiency and allows complete, freedom to explore new extenTsions, 
the technique is lacking in perspicuity. This /lack of perspicuity has 
three major drawbacks: CD the difficulty encouhtered -when drying to 
modify or . extend the grammar; (2) "tfaV p.r^^ trying to. communicate the 

extent of the grammar to either a user or' a colleague; (-3) the problem ! of 
trying to re-implement the grammar on a. machine that does not. support LISP. 
These difficulties have been partially overcome' by using a second, parallel 
representation of the grammar in a BNF-like specification language which' is 
the representation we haye been presenting, throughout this r^eport. This, 
however, requires supporting two different representations of the same 
information and does not really solve problems (1) or (3). The solution 
to this problem is a better formalism for expressing and thinking about 

semantic gramjiiars. ' . . . 

) ■ t \ 

Augmented Transition Networks (ATN) 

Some years ago, Chomsky (19^7) . introduced the notion that the 
processes of language generation and language recognition could be viewed 
in terms of a machine. One of the simpIe'sr^^Such mo"dels is the finite 
state machine. It starts off in its initial state looking at the first 
symbol, or word, of its input sentence and then moves from state to state 
a.f: it gobbles, up tha remaining input symbols. The sentence is acoepted if 
the machine stops in one of its final states after,, having processed the 
entire input string; otherwise the sentence is rerlected . A convenient* way 



of representing a finite state machine is as a transition graph, in whiah 

the states correspond to the nodes of the graph and the ' transitions between 

states correspond to its arcs. Each arc is labelled with a symbol whose/ 

appearance in the input can cause the given transition. / 

In an augmented tran.sitior network, the notion of a transition gradli 

has been modified in three ways: (1) the addition of a recursion-mechanism 

that allows the labels on the arcs to* be non-terminal symbols that 

correspond to networks; ('2) the addition of arbitrary conditions on ithe 

arcs, ^that must be satisfied in order for an arc to be followecj; (3)/the 

inclusion of a set of structure building actions on the arcs, together with 

a; set of registers for- holding partially ^built structures. (22) ' Figurjis 5.1^ 

is a specification of a language for representing augmented transition 

networks. . The specification is given in the form of an extended, 

context-free gr^mar in which- alternative^ w^ys of forming a constituent are 

■ . / ■ 

represented on' separate lines and the symbol is used to indicate 

arbitrarily repeatable constituents. (23) The non-terminal synlbols are 

lower case Engli.^h descriptions enclosed in angle, brackets. A^l-1 other 

symbols,, except "+", are terminals. Non-terminals not given in figure 5.1 

have names that should be self-explanatory. 

" . ^ . Figure "^5 ■ 1 

A Language for Representing ATNs 

<transition network> := ( <arc set> <arcViset>+) 
<arc set> := (<state> <arc>'+) ^' 

<arc> := (CAT Ccategory name> <test> <action>-9- <term act>) 
'^WHD <word> <test> <action>+ <term act>) 
^PUSH <state> <test> <action>+ . <terri! act>) 
,TST <arbitrary label> <test> <action>+ <term act>) 
.POP <form> <test>) 
I iVIR <constituent name> <test> <actio*n>+ <term act>) 
(JUMP <state>-<test> ,<action>+) 
Xaction^ : = .(SETR <register> ■<form>) 
'SENDR <register> <form>) 
[LIFTR '<reg.rster> <form>) 

,H0L'D <constituent name> <form>) / 
.SETF <feature> (form>) /. 
<term a'ct> := (TO <state>) ^ /. 



(22) This discussion follows closely a similar discusslbn Tn Woods . ( 1970 J 
to which the reader is referred. If the reader is familiar with the ATN 
formalism he/she may wis^ to skip to the section ''Advantages to the ATN 
Formal irm" . • / 

(2j) "+*' is use'i to mean 0 more occurrj^fices. Whil^' the accepted usage 
of is 1*or more, the accepted symbol \or 0 or ni6re. has not been 

used' td avoid confusion with the jjse of the symbol •/in the ATN formalism. 
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<form> := (GETR <register>) ,' ■ ' 

LEX 
■• •" * 

(GETF.<fonn> <feature>) ' 

(BUILDQ <fragment> <register>-^) • 

(LIST <fonn>-^) ; < 

(APPEND <foriD> <form>) 

(QUOTE <arbitrary structure>) 

Ihe first element /.'f each arc is a word /indicating the type of arc. 
For CAT, WRD and PUSH arcs, the arc type tbgether with j^he second element 
correspond to the label on an arc of a state transition graph. The third 
element is an additional test. A CAT arc can be followed,, if the current 
input' symbol, is a member^bf the lexical category named on the arc, and if 
the test on the arc is satisfied. A PUSH arc causes a recursive invocation 
of a lower level network beginning at the state indicated, if this test is 
satisfied. The WRD arc can be followed if the current inpvi't symbol is the 
word named on the arc and if the test is satisfied. The iST arc can be 
followed' if the test is satisfied (the label is ignored). The VIR arc 
(virtual arc) can be followed if a constituent of the named' type has been 
placed on . the hold list fay a previous HOLD action and the constituent 
satisfies the test. -In all of these arcs, the actions are structure 
building, actions, and • the terminal action specifies the state to which 
control is passed as a result of the transition. After CAT, WRD and TST 
arcs, the input is advanced; after VIR and PUSH arcs it is not. The JUMP 
arc can be followed whenever its test is satisfied, control being passed to 
the state. specified in the second element of the arc without advancing, the 
input. The POP arc indicate?:* the condi'^ions under which the state is to be 
considered a final state and the form of the constituent to be returned. 

The actions,' forms and tests on an arc may be arbitrary functions of 
the register contents. Figure 5*1 presents a usefql set that illustr-ites 
major, features of the ATN. The first three actions specifj,ed in Figure 5.1 
cause the contents of the indicated register to be set to the value of the 
indicated form-. SETR causes this tb be done at the current level of 
computation, SENDR at the next lower level of eabedding, so that 
information can be sent down during a PUSH, and LIFTR at the next higher 
level of coEcputation, so that additional information can he re^.urned to 
higher levels. The HOLD action places a form on the HOLD list tb be used 
an a latter place in the computat' -)n'. by'a VIR arc. SETF provides a means of 
setting a feature of the constituent being built. 
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GETR is a function whose value is the contents of the Tiamed- regi'ster . 
LEX is a form whose value is the current input symbol. The asterisk (*) is 
5i form whose value depends on the context, of its use: (1) in the actions 
of a CAT. arc, the value of * is the root form of the cuV'nent input word;. 
,,i2) in the actions of a": PUSH arc, it is the value of the ^ lower 
computation; and' (3) in the actions following a VIR arc, the value of it 
is<vthe cpnstituent removed from the HOLD list. GETF is a function which 
determines the value of a specified feature of the indicated form (which is 
usually *). BUILDQ is a. general structure-building form that places the 
values of the given registers into ' a specified tree fragment. 
Specifically, it replaces each occurrence of + in the tree fragment with 
the contents of one of the registers (the first register replacing the' 
first occurrence of +, the second register the secona", etc . ) . In addition, 
BUILDQ replaces occurrences iDf * by the value or the form The. remaining 
three fomis make a list out of the specified arguments (LIST) , .append two 
lists together to make a single list (APPENDj^nd produce as a' value the 
(unevaluated) argument form (QUOTE). 

Advantages of ATN Foriuallsm 

The ATN formalism was ^seriously considered at the beginning of the 
SOPHIE project, but rejected as being too slow. In the course of 
developing the LISP grammar, it became clear that' the primary reason for a 
significant difference in speed' between ■ an ATN .grammar and .a LISP grammar 
is due to the fact that processing the ATN is an interpreted process, 
whereas LISP is compilable and therefore tne time problem. could be pvercome 
by Duilding an ATN compiler. During the period of evolution of SOPHIE'S 
grammar, an ATN compiler was constructed (see Burton 19'76). In the next 
section we -will discuss_th.e_. advantages we hooed' to gain by using the ATN 
formalism. - 

These advantages fall into three general areas: (1) conciseness, (2) - 
conceptual '..'f fectiveness and avaiia'^Ic >;^cilities. By eoncisv-ness we 
mean Xhat writing a grai^nar as an ATN takes less characters than LISP-. 
The ATN formalism gains conciseness oy not requiring the specification of 
details in the parsing process at the same Level required in LISP. Most of 
.these differences stem from the fact tnat the ATN. assumes it has a machine 



whose operations are designed for parsing, while LISP assumes it has a 
' lambda'" calculus machine* For. example, a lambda calculus machine, assumes a 
function has one value. A function call to look for an occurrence at • a 
non-terminal" while parsing (in ATN formalism, a PUSH) must return at least 
two values: the structure of the constituent found, and the place ..Ir? the 
input where the parsing stopped. A good deal of complexity is added to the 
LISP rules in order to maintain the. free variable which has to be 
introduced to return the structure of the constituent. Other examples of 
unnecessary details include the .binding of local variables, and the 
specification of control Structure as AND?/, ORs and CONDs..^ 

The conciseness of the ATN results in a grammar that is easier to 
change, easier* to write , and debug, easier to understand, and hence -to 
communicate. We realize that conciseness does not necessarily lead to 
these results (APL being a prime example in computer languages mathematics 
in general being another), however, this is not a problem. The 
correspondence between the grammar rules in LISP and ATN is very close* 
The corcepts which were expressed as LISP code can be expressed in nearly 
the same way as ATNs but in fewer symbols. 

The second area of imt>rovement deals, with conceptual effectiveness. 
Loosely defined,, conceptual effectiveness ia the degree to which a language 
encourages one to think, about problems in the right, way. One example of 
conceptual effectiveness can be seen by considering the implementation of 
iase structured rules. (24) In a typical case structure rule, the 'verb 
expresses the function (or relation name) and the subject," while the 
object and preposltivnal phrases express the arguments of the function or 
relation. Let .us assume for the purpose of this discus.sion. that we are. 
looking at four different cases (agent, location, means-, and time) of ■ the 
verb GO — John went to the store ' by car at 10 'o'clock. In a phrase 
sti'ucture rule-oriepted formalism one would be encouraged to write: 

^statement> :r <aoLor> .Mi:.ilon/verb> xiocauion> <means> <time> 

Since the last three cases can apo- " in any order, one must also wif'ite 5 
other rules ; - 

124; See Bruce (1975) for a discussion of case systems. 
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<statement> := <actor> <action/verb> <iocation> <time> <means> 



In an ATN.one is inclined towards: - PUSH location 




which expresses more clearly. the case structure of the rule. There is no 

reason why in the LISP version of the grammar one couldn't write loops that 

are- exac^tly analogous to the ATN (the ATN compiler, after ail, produces 

sucn code!), however, a rule-oriented formalism does n^t encourage ona* to 

think this way. An- alternative rule implementation is: 

<action>:= <actcr><actioh/verb><action1> 
<action1>:.= <action1><temporal> 

'<action1>:= <action1><location> . 
.<actibnT>:= <action1><means> 

t)iis is eafiier (shorter) to v^ite out it has the disadvantage of being 
left-recursive. To implement it, , one is forced to write the LISP 
equivalent of .the ATN that creates a difference between the rule 
representation and the actual implementation. This method also has the 
disadvantage of introducing an unmotivated non-terminal. 

. Another conceptual advantage of- thie AtN framework .'.s that it 
lencourages the postponing of ascisions 3bout a sentence until a 
differential poin ; is reached, thereby ^^Ilowing potentially different paths 
to stay together. In the rule orientea SOPHIE grammar there are top level 
rules for <set>, a command to cnange one of the control settings and 
<m6dify>, a command to fault the instrument in some. way. Senter-ce. < -1 ) is a 
<set> and sentence (2) is a <modify>, 

- (1)-:Suppose the current control is hi;^n. 
(2) Suppose the current control is shorted. 



The two parse paths for^ these sentences should be the same for the first 
five words, but they are separated immediately by the rules <set> and 
<modify> . (25) An ATM encourages structuring the grammar, so that the 
decision between <set> "and <modify> is postponed so that the paths remain 
together. It could be argued that the fact that this example occurred in 
SOPHIE'S grammar, is^. a complaint against top-down parsing^^or semantic 
grammars, or just our particular instantiation of a semantic grammar. We 
suspect the latter but argue -that i-ule representations encourages this type 
of behavior . . ' 

Another conceptual ' aid provided by -ATNs is their method of handling 
ambiguity. Our LISP implementation uses a recursive descent technique 
(which can alternatively be viewed as allowing only one process^. This 
requires that any decision between two choices be made correctly; because 
. there is no way to try out the other choice after the decision is m^ade . At 
choice points, a rule can, of course > "look ahead" and gain information on 
which to base the decision, similar to the "wait-and-see" strategy used by 
Marcus (1975) but there'is no way to back up and remake a decision once it 
has returned. 

The effects of this can b^* most easily seen by considering the lexical 
aspects of the parsing. A prepass collapses c*>mpound words, expands 
'abbreviations, etc. This allows the grammar to be much simpler because it 
can look for units like "voltage/control" instead of having to decode the 
noun phrase "voltage control". Unfortiirately without the ability to handle 
ambiguity ,^ this rewriting can only be done on .words>'that have no other 
possible meaning. So, for example, when fr.e grammar is extended to handle: 

{}) Does the voltage * control the current limiting section? 

the compound "voltage/control" would have to be removed from the^ ^prepass 
rules and included in the grammar. This reduces the amount of bottom-up 
processing that can be dune and results in a slower parse. It also makes 

J125) The .degree to , which the separation of paths xs a problem can be 
greatly reduced using a preprocessing "compilation" state such a? klovstad, 
which tamong other things) collapses rules with the same initial parts. In 
our example, however this may not work' since the phrase "the current 
control" may be parsed as ths non-terminal <CONTR0L> in Cl ) and as the 
non-terminal <PART> in (2). Of course this would be a poor choice of 
grammar rules, and no one aware of sentences (1) and^ (2) would handle it 
this way. The problem is recognizing where situations such as this .occur. 



compound rules di;ffiGult to ^ write bec^^iuse alf possible . uses of' the 
indl-5^'(5yal words 'must be consic/ered to avoid errors. Another example is 
the- - use - of the letter "G'V as afi* abbreviation . Depending .on context , It 
could possibly mean either current, collector or_'capacitor . Without 
allowing.. -^ambiguity in the input,. it coul 1 not be allowed as an- 
abbreviation unless explicitly recognized by the grammar. ^ • 

The third general area in which A'iNs have an advantage .is in the 
available facilities to deal with complex linguistic- pMnenomena . While our " 
grammar has not yet expanded to the point of requiring any of the 
facilities', the availability of such facilities cannot be ignored as an. 
argument favoring one approach over another... A primary, example is the 
gene;*al mechanism for dealing with coordination in English described in 
Weeds; ( 1973a .)_. . 

Conversion to Semantic ATN \ 

For the reasons discussed above, the SOFHIE semantic grammar was 
re-wri^".^en in the ATN. formalism. We wisH to stress here that the 
re-writing was a process of changin*; f orm only. The content of the grammar 
remained the same. Since^a large part"" .bi the knowledge encoded by the 
grammar continues to be semantic in 'nature, we call the resulting grammar a 
"semantic ATN*'. " Figure. 5-. 1 presents the graphic ATN representation of <j. 
semantic grammar non-terminal. This is the same rule presented in Figure 
4.1, which recognizes the phrases for specifying rregsurements. in a circuit. 
The 'actions* and structure building operations on the arcs (which are not' 
shown in Figure 5.1) save the recognized constituents and construct the 
proper interpretation when sufficient information .has been collected. 
Appendix E provides more examples of the semantic ATN used in SOPHIE. 

Figure 5.2 presents a simple example ; of how the recognition of 
anaphoric deletions ■ can be capturea in ATN formalism.. The network in 
Figure 5.2 encodes the straightforward way of expressing a terminal of a. 
part in the. circuit — the base of w5 , the anode of it, the collector. By 
the state TERMINAL/TY.pE , both the determiner and the terminal type base, 
anode have been found. The first arc that leaves TERMNAL/TYPE accepts the 
preposition that begins the specification of the part. The second arc 



(JUMF arc) .corresponds to hypothesizing that the specification of the part 
hag :;J)een- deletedi as-in: ^'»The base is.=^open ." The action on the arc builds 
a place-holding form which identifies the deletion .. and specifies . (from 
information associated with the terminal type which was found) tJ:io classes 
of objects that ban fill the deletion. ^ The method for determining the 

referent of the deletion remains the same as described in Chapter 4. 

''. ' ■ ■ • . ' • ^ 

■ . ■■■Pigijrs- 5.2 "... 

An ATN which recognizes deletion 
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The SOPHIE semantic ATN. is then' compiled, using tha general ATN 
compiling system described in Burton (1976). ^he SOPHIE grammar provides 
the compiling system with a good contrast to the LUNAR grammar, since it 
does not use many of the potential features. In addition, a bench mark, of^ 
sorts, was available from the LISP implementation of the grammar tha^ could 
be used to determine the computational cost of using the ATN formalism. 

.There .were ?vwo modifications maae to the compiling system to improve 
its efficiency for '^t he SOPHIE application. In the SOPHIE grammar, a large 
number of the. arcs check for the occurrence of particular words. When 
there is more than one arc leaving a state, the ATN' formalism requires 
.that all of these arcs be tried, even if more than one of these is a WRD 
aro^and. an earlier WRD arc has succeeded. This is especially costly, since 
the taking of an arc requires the creation of a configuration to try the 
remaining ^rcs. In those cases when it is knowi^ that none of the other. 



arcs can succeed, .this should be avoided. As a solution to tnis^ problem, 
the GROUP arc type was added. The GROUP arc allows a set of contiguous 
arcs to be desi;^nated as mutuallyv exclusive . The form of t!ie oHOUP arc is: 
(GROUP arc1 arc2 ... arcn). The arcs are tried, one at a time, until the 
cond4.tions on one of tne arcs are met. This arc is then taken,.- and the 
remaining area in the GROUP are forgotten — not tried.. If a. PUSH arc is 
included in the GROUP, it will be taken- if its test is true and the^ 
remaining arcs will not be . tried even if the FUSHed for constituent is rot 
fo'und. For. example, consider the following grammar state: 

(S/1 . ^ " ' ■ 



(GROUP (CAT 


A 


T 


(TO 


S/2)) 


(WRD 


X 


T 


(TO 




(CAT 


B 


T^ 


(TO 





At. most, one of • the three arcs will be followed. Without GROUPing them 
together, it is possible that all tnree might be followed — if , the word X 
had interpretations as both category A and category B. 

The GROUP arc also provides an efficient means of encoding optional 
constituent's. The normal method of allowing options in ATN is to provide 
an arc that aCGep^3 the optional constituent aud a second arc that jumps to 
the next state:! without accepting anything. Fof*' example , if in state s/2 
the word "very^' is optional, the following two arcs would be created: 

(WRD VERY T C^O RES;r-0F-S/2 ) > 
(JUMP REST-0F-:/2 T) ) - 

Th^^^H^ag^iciency arises when, the word *'very" does occur. The first arc is 
taken, but an'.alternative»conf iguration, that will try the second arc must 
be created, and possibly l.-ter explored, by embedding these arcs in a 
GROUP, the alternative will not be created thus saving time and space.. As. 
a result, it won't have to be explored, possioly. saving more time. ' A 
warning should" be' included here, tn^i.t the GROUP arc can reject sentences 
that might otherwise be accepted. In our examplr, "very" may be needed to 
get out. of the- state REST-0F-Sy2w:rn . this respect, the GROUP arc is a 
departure .from the original ATN philosopny that arcs should be independent 
and for this we apologize. HoweveV, for some applications, the increased 
efficiency can be critical. 



The; other cnange* to the compil-.ng system (for the aeraant.lc gramnar 
application) dealt with the preproc^ssihs^"' bperationsT "ThT" preprocessing 
facilities described in the last chapter included : 1) lexical analysis to 
extract word endings;, 2) a substitution mechanism to expand abbreviations; 
delete noise words, and panonicalize synonyms; 3) "diptionary retr.'^eval 
routines; and. 4) a compound wora mechanism to collapse- multi-word phrases, 
For the SOPHIE application we added the ability to use the INTERLISP 
spelling correction routines and the ability to derive word definitions 
from SOPHIE *s semantic -net • The extraction^ of definitions from the 
semantic network for part. names and node names reduces the size of the 
dictionary and simplifies the- operations of changing circuits. In 
addition, a mechanism/ called MULTIPLES was developed that, permits string 
substitution . wit^^hih tne input. This is similar to ,the notion 'of 
compounding, but differs in that a-compound rule creates an alternative 
lexical item while the multiple rule creates a* different lexical item. 
After the . application of a compound ruUe, there is. an additional edge in 
the input chart; after a multiple rule, the effect is the same as if the 
user had typed in a different string. ^ * 

Fuzziness . ' . 

'r 

The one aspect of the LISP implementation that has not been 
incorporated into the ATN framework is' fuzziness, the ability to ignore 
words in the. input. While we have . not worked out the details, the 
non-determinism provided by ATNs lends itself to an interesting approach. 
In a orie-pro?ess — recursive descent — implementation, the rule that 
checks for a word must decide (with information passed- down from higher 
rules) whether to try skipping a word, or give- up. The critical 
information that is not available when this decision has to be made is 
•whether or not there is another parse that would use that word. In. the 
ATN, it is possible to isuispend a parse and come back to it after all other 
paths have bben tried. Fuzziness could be implemented so that rather than 
skip a word and continue, it can. skip a word and suspend, waiting for the 
ottfdr parses to fail or suspend. The end effect may well be that sentences 
are allowed to get fuzzier because there is no danger of missing the 
correct p.-.rse. . . 



The original motivation for changing to the ATN was its perspicuity. 
Appendices -A and B show the BNF/LISP version, , which can be compared with 
Appendix E, that shows the ATN^versionV -W 

find that neither of them are particularly readable , but then there*- is no 
reason to expect what this' should be the case. As Winograd (1973) has 
pointed, out, simple grammars are perspicuous in almost any formalism.; 

..complex grammars are still complex 'in any formalism. We found the ATN 

t formalism much easier jto think in, ^write in, and debug. The examples of 
redundant processing that were presented garli^er in this 'chapter were 
discovered while -converting to ATN. For a gross comparison on conciseness , » 
the ATN 'grammar requires 70$ less characters to express than the LISP 
version. ' , ' 

The efficiency results were surprising. Table 5.1 gives comparison, 
timings between the LISP version and the ATN compiled version. As can be 
seen, the ATN" version is more than twicfe as fast. This was pleasantly 

. counter-intuitive , as we expected the LISP version. to be much, faster due to 
the amount of hand optimization that had beeh done while encoding the 
grammar' rules. In presenting the coinparison timin.^, it should be rfientioned 

.that there are three differences between the tvfo systems that tended to 
favor .the .ATN version. (26) One difference was the lack of fuzziness in the 
ATN .version. The LISP version spentHime tissting -words other than the 
current word, looking ahead to see. if it were possible to skip -^this t/ord, 
which was not done in the ATN version. The second is- .the creation of 
categorizes for .words during the preprocessing in the • ATN version that 
reduced the amount of time spent accessing the semantic net and hence 
reduced the time required to perform, a category membership test in the ATN 
system. The third was the simplification of the grammar and increase in 
the amount of bottom-up processing that could' be done because of the' 
ambiguity allowed in the ih'put chart. In our estimation, the lack of 
fuzziness is the only difference that may have had a significant effect, 

(2b) The exact extent to which each of these differences contributed .is 
Difficult to gather statistics on due to the block compiler which gains 
efficiency by hiding internal workings. -The exact, contribution of each 
could certainly be determined but was not deemed worth the effort. 
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and ■ this' can be included explicitly in the ATN in places where it is 



critical, by using TST arcs ani suspend actions, without a noticeable 
incr'ease in processing time. In conclusion, we are very pleased with the 
results of the compiled semantic ATN .-.rid feel, that the ATN compiler makes 
the ATN formalism computationally efficient ehough to be used'in real 
systems. ^ 

Table 5,1 ' 
Comparison of ATN vs LISP Implementation ' 
Times (ill seconds) are "prepass'V'+ "parsing" 

1) What is the output voltage? 

/ LISP - .02^ + .018 = .0^2 

ATN - .one + .033 = .081 • . . 

2) What is the voltage between there and the base of Q6? 

LISP - .038 + .039 =' .077 
. ■ ATN - .090 + .0^6 = .136 

3) Q5?' . ' ■ 

LISP -. .010 + .0^6 r .056 ' . ■ 

ATN - .013 + .060 = .073 

What vi^s the output voltage when the voltage .control is set to .5? • 



LISP'J 
.ATN - 

5) If C6 



'X045 + ':038 = .08.3 
.096 + .0^8 = . 

has an open emi tter-^'and a shorted base . collector junction what 



happens ■ to the voltage between its base and the junction of the' voltage 
linn ting section and.>-the voltage reference source? 

LISP - .206 + .188 = .39^' 
ATN ■- .259 + .090 =. ..3^9 



Chapter 6 
■OBSERVATIONS ON STUDENT USAGE 

When we began developing a ria.tural language processor for an 
in.ttructional environmefJit , we knew it had to be (1) fast, (2) habitable, 
and (3) self-teachihg. v^ The basic .conclusion that .^as- arisen from the work 
presented here is that\t is possible to satisfy these constraints. The 
notion of semantic gramqsar Cspresented in Chapter 4) provides a paradigm for 
organizing 'the knowledge requ^.red in the understanding, process that permits 
efficient parsing. In a.lditio'n^ semant/ic grammar aids the habitability by 
oroviding -insights into. a useful\classj of dialogue constructs, and permits 
efficient handling of such phehomenkas . pronominalizations and ellipses. 
The need for a better formalism foV/ expressing semantic grammars led to 
the use of Augmented Transition Networks (presented in Chapter 5). The 
ability of the ATN-expressed semantic grammar to satisfy the above stated 
requirements is demonstrated in the natural language front-end for the 
SOPHIE system. • . 

A point that needs to be stressed is that the SOPHIE sys^'em has been 
tand is being) used by uninitiated students in experiments vto determine 'the 
pe'dagogical effectiveness, of its environments. While much has been learned 
about, the problems of using a natural languP.ge interface, these experiments 
were not "debugging'? sessions for the natural language component. The' 
natural .. language component has unquestionably reached , a state at which it, 
can be conveniently used to facilitate learning about electronics. In this 
chapter, we will describe the experiences of students using the natural 
language cotaponent, an'd present some ideas on handling erroneous inputs. 

Impressions. Experiences and Observations 

Prior to any exposure to SOPHIE, a grotip of four students were asked 
to write- dowfi all of the ways , they could think of for requesting the 
voltage'at a particular node. Although the intent of the « experiment was to 
determine tlftfe-^-range of paraphrases that students might be inclined to use 
before they-were aware of the system's' linguistic limitations, a more 
interesting result emerged. - Each, student wrote down one phrasing very 
quickly but h^d a difficult^,,t;inie thinking of a second, even though the, 



initial phrasing by three of the students' were in fact different! One 
student quit, exclaiming "But there is only one-way to.j, ask thatT** "^.THis"^ 
same inaoility to perform linguistic paraphrase carried over to the actual 
interaction with SOPHIE via terminal. Whenever the system did not accept a 
query, there was a marked'"" delay before the student tried again. Sometimes 
the student would abandon his line of questioning completely;^ At the- s^me^ 
time, data collected over, many sessions indicated that there was no 
standard — canonical — way to phrase a question. Table^6.1 provides some 
examples of the range of phrasings used by^'students to ask for the voltage 
at a node. • • 



Table 6.1 • 
Sample Student Inputs 

The following are some of the input lines typed, by students with the intent 
of discovering the voltage at a node in the circuit. 

What is the voltage at node T? . - ^ 

What is the. voltage at the base of Q5?" ' 
How much voltage at NIO? 
And what is the voltage at N1?- 

N9? ' ■ 

V. at the neg side of C6? 

VII is? • ; ^ 

What is the voltage from the base of transistor Q5 to^'ground? « 
•What V at N16? .V • ■ ^ ' 

Coll. of Q5? ' ' • .' ■ ^ ■ - 

Node 16 Voltage? ' 
What is' the voltage at pin 1? 

Output? . ' . '\ . 



As Table 6.1 shows, students are likely to conceive of their questions in 
many ways and to express each of tijese conceptions in any of several 
phrasings. Yet other . experiences indic'ate th^t they lack the ability to 
, easily convert to another conceptualizatioii or phrasing. Since the 
non-acceptanc3 of questions creates a major interruption in the student's 
thought , process , the acceptance of many different pnraphraises is critical 
to maintaining flow in the student's problem solving. 

Another interesting • phenomenon that occurred during sessions was the 
chang^^'i'rt the linguistic behavior of the students as they . used' the system. 
Initially, queries were stated as complete English quesolohs, generally 
stated in templates created by the students from the written examples of 
sessions that we had given them. If they needed to* ask something tl^at did 
not exactly fit one of 'their templates, they would try a minor variant. As 



they became more familiar with the mode of i/'^teraction , they began to use 
abbreviations, to 'leave out,.parts of 'their questions and, m general, to 
assume that the system was following their/interact : on . After five hours 
of experience with che syscem, almost all of one student's queries 
contained abbreviations and. one in six depended on the .context established 



by prevrdu^- statenirint3. 



P^EEDBACK - v^hen the Grammar Falls 

From bur experiences wi th students using SOPHIE , we hav.e been 
impressed wit'h the importance of providing feedback to Unacceptable, inputs 
— what to do when the system doesn't understand an input — . While it may 
appear- that in a comlpletely habitable system all inputs would be 

understood, no system has ever attained • this goal and > none will in the 

* * * -J. 

foreseeable future. To be natural to. a naive user*, an intellig<?nt system 

should act intelligently when it fails too. The fii-^sc step towards having 

a system fail . intelligently is the identification of po=^sible areas of 

error. In student's use of *the^SOPHIE system, we have found the fblJ owing 

types of errors to be common: 

• ■ ■ ■ ■ ■ ■ . ■ ■ ■ ■ / 

'•(1) Spellitig errors and mis-typings - '"Shortt the CE og Q3 ^ and opwn its 
Hasp"- "What isthe vbe ■ ' 



(2) Inadvertent omissions - "What is the BE of ;j(The user left outnhe 
quantity to measure. Note' that in other contexts' this is a well formed' 
question. ) " 

■ i . 

(3) Slight misconceptions that are predictable - "What is the output of 
transistor . Q3?" (The output of a transistor is not defined): "What 
is the current thru node 1?" (Nodes are . places where voltage is 



measured and ^ may have . numerous, wires, associated with them) ; "What is 
R9?" (R"^ is-a res'^istor): /"Is Q5 conducting?" (The laboratory . section 
of SOPHIE gives informatipn that is directly available from a real lab 
such' as currents and voltages . ) * " ' 



(4.) Gross misconceptions whose underlying meaningVis well beyond designed 
system capabilities - "Make the output voltage\ 30 volts": "Turn ori the 
power supply and tell me how the unit functioi^s"; "What time is it?".- 

The best technique for dealing wllh each type of error is an open problem. 
In> the remainder of this section, wd will discuss th^i solutions used in the 
SOPHIE system to provide feedback. 

The use of a spelling correction algorithm (borrowed from INTERLISP) 
has. proven to be a satisfactory solution to type 1 errors. During one 
student's session, spelling correction was required on, and resulted in 
proper understanding of, 10? of the. questions. The major failings of the 



-52- 



INTERLISP algorithm are the restriction oa" the 3i'>.e of the target set of 
correct words, (time i'ncreases linearly wi th tlie- number of words) and its 
failure to correct run-on words* (The time required to determine if a word 
may be two (possibly misspelled) words run together increases very quickly 
with the length of the word and the. number .of possibly correct words. With 
no .context to restrict the 'possible list of words, the computation involved 
is prohibitive*) A potential ablution to both shortcomings -would be to use 
the context uf the parser to reduce the possibilities when it reaches the 
unknown word. Because of the, nature of the grammar, this would allow 
semantic context as well as syntactic context to be used. 

- Of course, the use of any spelling correction procedure has some 
dangers. A' word that is spelled. correC-tLy but that the system doesn'-; know 
may . be changed through spelling correction to a word the system ^does know. 
For example if the system doesn't know the word'' "top" but does knOw "stop", 
a user's command to""tGp everything"- ,can be disastrously misunderstood. 
For th:. s reason, words like "stop" are not spelling corrected.. 

Our solution to predictable misconceptions (type errors) is to 
recognize them and give erro'" messages that; are directed at correcting the 
misconception. We are currently using two d.-.fferent methods of 
recognition: One is to loosen up the grammar so that i^ accepts plausible 
but meaningless sentences. This technique provide,?, the procedural 
speci^list3 calxed by the plausible parse, enough context to make relevant 
comments. For. example, the concept .of current through, a node is accepted 
by the gnammar even though it is meaningless. The specialist that performs 
measurements must then check its arguments and provide feedback if 
necessary : 

>> WHAT IS THE CURRENT THRU NODE 4? ■ . ' , 

The current thru a node is not raeaningful since by Kirchoff's law 
the sum of the currents thru any node is zero. Currents can be 
measured thru parts (e.g. CURRENT THRU C6) or terminals 
(e.g. CURRENT THRU THE COLLECTOR OF Q2). ' - 

Notice that the -response to the question presents some examples. of how to 
mea^iire the currents along wires . that .lead into the mentioned node. 
Examples of questions m that will be aco3pted and are rel.evant to the 
student's needs are among the best possible feedback. 



Tne second method of recognizing common misconceptions is to "key" 
feedback o:f single words or groups of word's. In the following examples, 
.the **keys" are "or" and^ "turned on". Notice that the response presents a 
■general char:'icterization of the violated limitations as well as suggestions 
for alte:*i,ative lines of attack. 

>> COULD Q1 OR Q2 BE SHORTED? 

1 can only handle one question, hypothesis,, etc. ac a time. The Tact 
• that you say 'OR' indicates that you Jnay be 'trying:, to express two . 

concepts in the sai: 2 sentence., lylayos you can Lreak your statement 
into two or more simple 6nes. ^ ■ > 

. . >> IS THE CURRENT LIMITING TRANSISTOR TURNED CN? 

The laboratory section of SOPHIE is designed to. provide the same, 
elementary measurements that *would t)e available in a real lab. If you 
want to determine the state of a transistor, measure the pertinent 
currents and*, voltages . . ' . 

These methods of handling type 3 errors have: proved .to^ be very helpful. - 

However, they recvuire* that all of the misc^onceptions must be predicted and 

progr^ammed for. in advance . This limitation makes them inapplicable to 

novel situations; ' ^ - / 

< ■ ■ ■ ■ / ; " . 

The most severe prot3lems a user hVs stem from type; 2 (omissiohs) and" 
■ ^ ■ ■ ■ ' . / ■ - ■ . 

type 4 (major misconceptions) errors. (^Type 3 errors that haven't been 

predicted are considered type 4 errors. 3 After a simple omission, the user. 

may hot see that he has left anything out arid may . conclude that the system 

doesn't know . that concept or phrasi^ng of .that concept. For' example when 

the user types ^'VJhat .is the BE 'of 05^ instead of "What is the VBE of Q5?'^ 

he may decide that it is unacceptable because the. system doe3n,!..t allow 

"VBE" a:? an abbreviation of "base emitter voltage". For type 4 errors, the 

. user may waste a lot of .time and en/ergy attempting several rephrasings of 

his query; none of which *Gan De ' understood because the system doesn't know 

the concept the user is trying to^ express. For example, no matter how it. 

' . . • ■ / ■■ ' - ' 

is phrased, the system won't understand "Make the output voltage 30 volts" 

• * ■■■ ./ , ■ • ■ 

because measurements cannot be directly changed,, only controls and 

specifications of parts can be /changed. . 

* The feedback • necessary to/ cprrect both of these classes of errors must 
identify * any concepts, in the/ statement that are understood and suggest the 
range of things that can be /done to/with these concepts. For ty^pe 2 
errors, this wili help the user see his omission. For type ^4 errors, it 

/ ■ . . ■ ■ 



may suggest alternative conceptualizations that will allow the user to., get- 

• at the.: — sarae . ir.formation (for-. -.example, to change- tiie output voltage. -- - 

indirectly by changing one of the controls) or at least provide him with 

enough information to decide when to quit. > 

The notion of semantic grammar may be useful in developing a igeneral 

solution along the following lines: _A bottom-up or island parsing scheme ^ 

' ... . ■ 

could be . used to identify wellrformed constituents. (27) Since the graniuiar 

: is semantically based, the ccnstltuents that are found represent "islands" 

of meaningful phra^ses. The ATN representation of the semantic grammar can 

then be inspected to discover possible ways or combining these islands. If 

;a good match is found, the grammar can be used to generate a response that 

indicates What other semantic parts are required for that rule. Even if no 

good matches ars found, a positive statement may be made that explains the 

set of possible ways the recognized structures c>puld be understood . Much 

more. work is required in the a;^ea of unacceptable inputs' before natural 

language systems will feel really natural to naive users. 
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127) .William Woods and Geoff. Brown are presently refining such a bottom-up 
parsing technique for ATN grammars for* use in the BBN Speech project (Woods 
1976) 
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Chapter "7 
CONCLUDING DISCUSSION 



Future Research Areas " . 

The SOPHIE* semantic grammar - system is designed for \a particqiar 
context — trouble shooting. -- within a particular domain, namely, 
electronics. It represents the compilation of those pieces of knowledge 
whiah are general (linguistic) together with specific domain' dependent 
knowledge. In its present form, it lis unclear which knowledge belongs to 
which area. The development oX semantic grammars for other applicationis 
and extensions to the semantic- grammar mechanism to include other 
understood linguistic phenomena will clarify this distinction . 

While the work presented' in this report has dealt mostly on ' one- area 
of application, the notion of sema'ntic grammar as a method of integrating 
knowledge into' the .parsing process has wider applicability . Two 
alternative applications of the technique Have been completed. Or)e deals 
with- simple sentences in the domain ofx attribute blocks (Brown et al. 

1^75)' Whi.lo the sublanguage acc&pted[ in the attribute blocks environment 

is very simple, it is notewortc^ that with.ln the semantic grammar paradigm, 
a simple grammar was quickly developed that greatly improved the 
flexibility of the input language. The other completed application deals 
with questions about the.editing system NLS (Grif^netti et al . 1975)... In 
this - application, most questions- dealt with editing commands and their 
arguments, and fit nicely into the. case frame notion mentioned, in Chapter 
5. ' The case frame use of semantic grammar is being considered for, and may 
have its greatest impact oh, command languages. Command languages are 
typically case centered around the command name that requires additional 
arguments (its "cases). The combination of the semantic claissificatibn 
provided by the setnantio grammar \and the representation of- case rules 
permitt.ed by ATNs should go a long way towards reducing the rigidity of 
complex command languages such as those' required ■ for message processing 
systems. The combination should also be a good representation fo^r natural 
language systems in domains where it is possible to develop a strong 
underlying* conceptual space, such as management information systems 
(Malhotra 1975) . ■ . " ' 



The e^ctension of the semantic grammar to incorporate existing" 

linguistic processing techniques- is another potentially fruitful- research 

area. One cf the ways semantic grammar gains efficiency is to separate the 

processing of syntactically similar sentenoes on semantic ^grounds when it 

is useful to do so. However , this pf '^ents the uniform, incorporatioh of, 

for example, Woods' (1973b) solution to the problems of relative clause 

modification, quantifiers and conjunction. One means of integrating these 

" techniques would be to develop an . intermediate target language that 

ma4.ntains t' « advantages *; of the semantic grammar approach wh.ile allowing 

jiniforra solutic;ns to other problems. It may even be possible to adopt 

Woods' query language, /allowing "the semantic/ grammar to dictate the 

functions within the "propositions" arjd '"commands" . An alternative attack 

would ., be to vuse a "^ntaotic" processing phase, incorporating the desired' 

technique-3 that ^canopicalizes the inplit before it is processed by the 

semantic grammar. In this method, the semantic gramihar would-be viewed as 

an- Interpretation phase of the understancling process, but which works on a 

much less structured syntactic parse than, for example, the LUNAR system. 

. ' ... a ■ ' 

eONCLUSIONS 7, ~ 

In the course of -this report, we have described ^the evolution .of a 

'.natural language front-end from keyword beginnings to a system capable of 

using complex linguistic knowledge. - The guiding strand has been the 

utilization, of semantic information to produce efficient natural language 

processors. There were several highlights that , represent noteworthy points 

in the spectrum of useful natural language systems. Toward the keyword end 

of the scale , the -procedural encoding technique with fuzziness (Chapter 4 

and Appendix B) allows simple natural language input to be accepted wi-thout. 

introducing the complexity of a new formalism. Encoding the rules as 

procedures allowed flexible control of the fuzziness and the semantic 

nature of the rules provides the correct places to take advantage of the 

flexibility J As the language covered by the system becomes more complex, 

• '"^^ 

the additional burden of a Grammar formalism will more than pay for itself 

\ • 

in terms of ease of development and reduction In complexity. The ATN 
\ compiling system allows for the consideration ^ of the ATN formalism by 
'reducing its runtime cost, making it comparable to a direct procedural 



encoding. The natural language front end ncv/ lis^d by SOPHIE is constructed 
by ^ compiling a ^semantic ATN. As the "linguistic complexity of the language 
accepted, by the system increases, the need for more syntacti^'j knowledge in 

: thie grammr '.ecomes greater. " Unfortunately , this often works at cross 
purposes with the semantic character of the- grammar. It would be nice to 
have a general grammar for English yyntax^ that could be used ^to preproceso 
sentences; however, pne is. not forthcoming.' A general 'solution to t'le 
problem of incorporating semantics with the current state of incomplete 
"knowledge of syntax remains an open research problem.^ In the foreseeable 

• future," any system will have to J)e an engineering trade-off between 
complexity and generality on. one hand and efficie^ncy 'and habitability 'on 
the other.. We ha</e presented several techniques that are viable bargains 
in this trade-off. 
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\ Appendix- A 



BNF Descri'i5tion of Part of the. 
^ >SOPH'XE. Semantic Grammar 



This appendix feives a BNF-like description of part of the language 
accepted by SOPHI^, ' Included are all of the rules necessary to parse a 
"measurement". Examples of "measurements" are "voltage at N1". "base 
emitter current of\Q5",. and "output voltage". The grammar is implemented 
as LISP functions an(l an example is listed m Appendix"'^; 

In the description, alternatives on t-l^e— rxKhlr^tl'atra* side are separated 
by ! or. are listed on separate lines • brackets [] .enclose , optional 
elements.^ An asterisk * is used to mark notes • about a particular rule. 
Non-terminals are designated by names enclosed in angle brackets <>. 

\ ' . ' ■ ■ ' , . ^ 

The Grammar 

•v^.circuit/place> := <terminal> ! <node> y'' " > 

<diode/spec> := <diod,e> ! <2ener/di.ade> 

<se6tion> diode ! <section> zener/diode 

<junGtiori> := < iunction/type> [of ] <transistoryspec> 

<transistor/term/type> .and . < transistor/ term/ type> [of] 

[<transistor/spec> ] 
<transistor/term/type> to <transistor/term/type> [of] 

[<transistor/spec> ] 

< junction/type> := eb ! be ! ec ! ce ! cb ! be 

<meas/quant> := voltage ! current ! resistance* ! power 
*means measured resistance ^ 

<measurement> := <section>[output*] [<meas/.quant>] 

output* <meas/Quant> [of] <section> , 
output* [ <meas/quant> J [of <transformer>] 
<transformer> <meas/quant> 

. <meas/quant> between** <circuit/place> and* 

<circuit/place> 
<meas/quarit> of*** <.part/spec> 
<meas/quant> between output terminals 
. <meas/quant> of <junction>. 

'-'1... -r.^ .A.- <meas/quant> of <circuit/piace> 
.'. . \ .<meas/quant> from <junction> 

' ■ <meas/quant> of <section> 

<meas/quant> of <pronoun> . 
< junction/type> <meas/quant> [of <trahsistor/spec>] 
<transistor/term/type> <me'as/quant> of 
. [<transistor/spec>] 
*input also 
**from-to also works 

***at, thru, in, into, across and through also work 

<node> := junc^'.ion of <part/sn^c> ^and <part/spec> 
node oc^ween <w^cl. Lbn> and <section^ 

[point] between <part/spec> and <part/spec> * ^ 
<node/name> !-[node] <node/numjber> 
■ <pronoun> 

<num/spec> := "any positive number" [k] ! one 

<part/spec> := <part/name> ! <load/spec> ! <section> <part/typ^'> 

<pronoun> i ' . 
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<.pot/spec> :=•. CO ! vc ! qcI ^ 
<p?r*orioun> := il ! [i hal }. '^.ype" 

<terminal> oul put. [l.ermLnalJ ; < ; r-j i3t.or/1't:riii> ! "center/t ap 
posLt.ive terminal ; " p-.M-* /:>pec> 1 ! positive one 
negative terminal ; r,art./spec> 1 ! ncrat.lve one 
anode [<diod.e/Gpec> ! cat.hode ( <d iode/spec> ] 
-wiper. [<pot/specv] . . . , . 

<t.ransisior/spec> ■ : = <transistor> ! <3ect.ion> transistor ! <pronoun> 

<tran5istor/term>. := <transi5tor/term/t ype> I <t ran.s ist.or/spec> ] 

.<transistor/term/type> := base ! collect.or ! emitter- 

<t ransist.or> , <capacitor>, ' <cliode>, <re.3:3tor>, <transf'orner> :and' 
<zener/(iiode> all check the semant ic network and pa'r$e correct oart names, 
e.p:» r9, qo. . . 

<section> uses the semantic network ..t.o deterTine if a word is a sect. ion of 
the unit, e.g. current/limiter • 

<part/name>' uses the semantic network t o set 11 a word is the name of a 
part e»p».../r6, c^, t2* . . 

.<node/name> checks semant ic network for riodt- names • . 



64 



/ 



. Appendix B 

■ ■•■ ■ ' . ■ • . i' •) ' 

A LISP Rule from the Semantic ...Grammar^ 

This appendix describes the method of encoding the grammar * as LISP • 
procedures. The ways* of expressing a non-terminal are embodied in a 

frammar function. Each grammar function takes at least two arguments; 
TR, the list of words to be recognized, and N,. the degree of^fuzziness 
allowed. '.The grammar function, in effect, must determine whet^her the 
beginning of. the string STR contains an occurrence of the corresponding 
non-terminal. There are generally two types of 'checks that a grammar 
function performs. One is a check for the occurrence of a. word or\jopds 
which satisfies certain predicates. This checking is done with \two 
functions ~ CHECKLST and ^ CHECKSTR. .CHECKLST looks for a word in the 
'String matching any of a list of words. . CHECKSTR looks for a word in the 
'String satisfying an arbitrary predicate. It is through these functions 
that the .parser . implements its fuzziness. For example, if CHECKSTR is 
cdlled with the siring "resistor R9" and a predicate which determines if a 
word is the name of a part (e.g. "R'g"):, CHECKSTR will succeed by skipping 
the word "resistor", which in. this phrase, is a noise word. 

The other usual type of -operation performed by the grammar functions 
is to check for the occurrence of other non-terminals. This is done by 
calling the proper function (grammar rule) and passing it the correct 
position in the^input string. 

If a grammar rule is successful, the function passes back two pieces 
of information. First, it returns some indication of how much of the input 
string is accepted .(i-e. where it stopped). The convention adopted is 
that the grammar rule returns as its vdlue a pointer to the last word in 
the string accepted by the rule. Second, the function passes back a* 
structural description of the phrase that was parsed. This ^structure is 
passed back in the free variable RESULT (analogous to an ATN s upon 
return from a PUSH. 

Listed below is the grammar rule for the concept of a junction of a . 
transistor. This rule accepts phrases such as "base emitter junction of 
Q5", "BE of the current limiting transistor", or "collector emitter 
junction". 

(<JUNCTION>* * . ' . 

[LAMBDA (STR N) 
(PROG (TSl R1) 
(RETURN 
(AND 

(* COMMENT A) 

- [OR (AND (SETQ TS1 ( < JUNCTION/TyPE> STR N)) 
(SETQ R1 RESULT)) ' . 
^ (AND (SETQ TS1 ( <TRANSISTOR/TERM/TYPE> STR N)) 
(SETQ' R1 RESULT) . " 

[SETQ TSl . . 

( ^TRANSISTOR/TERM/TYPE> 
(CDR (CHECKLST (CDR TSl) 

(QUOTE (AND TO] . ; 
(SETQ R1 (JUNCTION-OF-TERMS R1 RESULT] 

(* COMMENT B) " 

(COND . . 

([SETQ STR (<TRANSISTOR/SPEC> 

(CDR (GOBBLE (GOBBLE TSl (QUOTE (JUNCTION))) . 
((jUOTE (OF)) ^ 

(SETQ RESULT (LIST R1 RESULT)) - 
STH) 
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([SETQ-RESULT (LIST R1 (LIST (QUOTE PREF) 

. •• : (QUOTE (TRANSISTOR] 

IS1 )) ^ 



COMMENT A: 



The first thing that is looked for is either a < junction/type> (BE, emitter 
collector, etc.) or two <transistor/terminal/type>s (base, emitter or 
collector) separated by the words "and", or "to", v If two terminals ?»re 
found, the function JUNCTION-OF-TERMS is called to determine the proper 
iunction. In either case,. the place where the successful subsidiary rule 
left off is saved in TS1 and the meaninp of the accepted phrase is saved in 
R1. .. • " 

COMMENT b: . . , 

The next thi^ng needed for a junction is ^ tranri^tor <TRANSISTOR/SPEC> . 
<TRANSISTOH/SPEC> looks for an occurrence of a. transistor , e.g. "Q5" or^ 
"current limiting transistor".. GOBBLE is a function for skipping 
relational words when they are- not used to restrict the remaining part of 
the phrase. If a transistor is not found, a deletion is hypothesized and a 
call to PREF is constructed.- If the transistor has been pronominalized as 



in "the base emitter of; it". <TRANSISTOR/SP^EC> would recognize "it". In 
either case the semantics of the recognized phrase (something: like (EB Q5)) 
is. put into RESULT and a pointer to the last recognized word ,* 
the value of <JUNCTION>. 

There are approximately 80 grammar rules in SOPHIE 's grammar. 
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• Sample Parses and Parse T'h^oj. for the LISP Implementat,ion 

' This- appendix presents some examples of sentences handled by the 
natural language processor together with their parse times. Under each 
statement, the semantic interprelr.'t i. -jn returned by the parser is given. 
The semantic interpretation function call which when evaluated 

performs . the processing required 'j ■ the. statement ♦ Parse times are given 
.m milliseconds. 

Insert a fault . 
(INSERTFAULT NIL) 

85 ras . ^' . 

What is the output voltage? 
(MEASUREVOLTAGE NIL OUTPUT) 
40 ms' 

., ' ' ' ' ■■' 

What is the voltage "between th^^ current limiting' transistor ... 
and the constant current source? 
(MEASURE VOLTAGE ( NODE/BETWELN 

(FINDPARI C/J?trtfc;NT/LIMITER TRANSISTOR) 
■ CURRENT/SOLi":OE)) . •. 

335 ms 

What is the voltage between tliere and the ^ase of Q5? • 
(MEASURE VOLTAGE (PREF (NODE TERMliVAL)) (3ASE Q6)) 

80 ms • . • , * 

Q5?. • • 

(REFERENCE ^ (TRANSISTOR) Q5)) 
60 ms' 

Could the problem be that Q5. is T'B/.'i 

(TESTFAULT Q5 BAD) 

100 ms " . ^ • 

Could it^ be shorted? 

(TESTFAULT (PREF (PART Jlw^ri^^f N TERMINAL))' SHORT) 
75 ms 

If H8 were 30k what wou. the output voltage be? - ' • " 

(IFTHEN (R8 30000.0 VALI't. 

(MEASURE VOLTAGE \1L OUTPUT)) 
220 ms . •, " 

If C2 were leakv what would the voltage across it be?. 
(IF; HEN (C2 lEARY) 

(MEASURE VC:..:i\'oE (PREb (PART JUNCTION)')) 
1.20 ms • • . 

What is the output voltaee when the voltage control is set to *5? 
( RESETCONTRlL (STQ VC .5) 

'.MEASURE VOLTAGE NIL OUTPUT)) 

85 m.3 

Wn-ar is it with : i set at. .6? 

(RESETCCNTROL ("SIw (PREF (POT LOAD SWITCH)) .6) ' • . 

• - ^ REF!^RENCE NIL) ) . • 

1 10 ms • • . 

If it is -set 1 0 .9? • 

(RESETCCNTROL (STQ (PREF (POT LOAD SWITCH)) .9) - . 
(REFERENCE NIL) ) ' 

135 ms 



What is the currant t/firu the cc when the vc is set to 1-0? 
( RESETCONTROL (STQ VG T.O) 

(MEASUf^E eURHENT CC)) 
1.90 ms. . I . * 

If Q6 has an open emitter and a shorted base collector 
junction, what. nr?.ppens to the voltage between its base and 
• the junction of i'oa voltage* limiting section and the voltage 
reference soVi'^r^? • 
(IFTHEN / 

(MULT ;n.MTrrEH Q6) open) 

('.jC/;IREF (transistor))) SHORT)) 
(MEASURE VQLTAGE . ■ * ' 

(/BASE (PREF (TRANSISTOR))) 

PNODE/BETWEEN VOLTAGE/LIMITER REFERENCE/VOLTAGE))) 
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Appendix D 

Examples of ATN Compilation 
♦ ■ . ' • ■■ ■ 

This appendix presents a simple" augmented'transition network grammar 

along with two different programs com.piled from it and a trace of the first 

program parsing a sentence. The ATN grammar was taken from (Woods 1970). 

Both compiled versions of the grammar iume a depth- first search strategy 

and use configurations which include the state, node, stack, registers, 

features and ,hold list. 

The first program does not support lexical ambiguity (neither that 
claused by compound rules nor that caused by multiple interpretations under 
the same .category) . In addition, it neither keeps a well- formed substring 
table, tests for input before pushing nor returns features with popped 
constituents. The second program, on the other hand, has all of these 
capabilities. The listing of the second program also includes tracing 
functions the compiler includes in the program to allow-the user to follow 
its operation. Both programs are given in CLISP (Teitelman 197^). 

The final section of the appendix contains a trace of the- first 
■'program (using ^ a version which did include ^tracing functions) discovering 
all possible parses of the sentence "John was believed to have b'een shot by 
Fred". .Shown in the trace are all of the' arc transitions taken by the 
parser together with all register .setting operations. (The reader may 
compare this with the .analysis of this sentence given in (Woods 1970).) 
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The grammar 




(CAT AUX T. 

'SETR V *) 

SETR -TNS (LIST (GETF » TENSE))) 
SETRQ TYPE Q) 
T0Q1/)) 

■ (PUSH NP/ T . ., 'V 

(SETR SUB J *) • . - 

SETRQ TYPE DCL) 
(TO Q2/) )) 

(9.1/ • ■ , 

'•(PUSH NP/ T 

(SETR "SUB J *) 
(TO Q3/) )) 

(Q2/ 

(CAT V T 

(SETR V *) 

JSETR'TNS (LIST (GETF * TENSE))) 
(TO Q3/))) 

(Q3/ 

(CAT V (AND (GETF * PPRT) 
(EQ (GETR V) 

(QUOTE BE))) 

(HOLD (GETR SUBJ)) 

• (SETR SUBJ (BUILDQ (NP (PRO SOMEONE)))) 

(SETR AGFLAG T )'■ 
- (SSTR V *) 
(TO Q3/) ) 
(CAT V (AND (GETF * PPRT) 
(EQ (GETR V) 

(QUOTE HAVE))) * 
(SETR TNS (APPEND (GETR TNS) 

'(QUOTE (PERFECT)))) 

(SETR V .») 

(TO Q3/) ) 
(PUSH NP/ (TRANS (GETR V)) 

(SETR OBJ *) 

(TO Qi^/) ) 
(VIR NP (TRANS (GETR V) ) - 

(SETR OBJ *) ■ 

(TO QU/)) 

(POP. (BUILDQ is ^ ^ (TNS"^) (VP (V +))j 
TYPE SUBJ TNS V) 
(IN'IRANS (OETR^V)))) 
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m/ 

(WRDBf (GETR AGFLAG) 
(SETR AGFLAG NIL) 
(TO 07/)} ■ ' ' 

■ (WRD TO (S-TRANS (GETR V)) 

(TO Q5/)) 

(POP (BUILDQ (S + + (TNS +) (VP (V +) +)) 
. TYPE 'SUB J TNS V OBJ) 

■ T)) 
(Q5/ 

^(PUSH VP/ T 

:SENDR SUBJ (GETR OBJ)) 
,SENDR TNS (GETR TNS)) 
,SENDRQ TYP-i: DCL) 
,SETR OBJ »•) 
:tO Q6/))) 
(Q6/ 

(WRD BY (GETR AGFLAG), 
(SETR AGFLAG NIL) 
■ (TO Q7/)) 
(POP (BUILDQ (S + + (TNS +) (VP (V +) +) ) 

TYPE SUBJ TNS V OBJ) 
• ■T)) . ■ . . 
(Q7/.. .- 

■ (PUSH NP/ T ■ 

(SETR SUBJ ») 

(TO Q6/))) I 

(VP/ 

•tCAT V (GETF » UNTENSED) 
(SETR V ») 
(TO Q3/))) 
(NP/ . . 

(CAT DET T , ■ . 
(SETR DET »J/ 

(TO NP/1.)). . 
(CAT NPR T 

(SETR NPR »)■ 
(TO NF73))) 

■ Cnp/ r ■ 

■ (CAT.ADJ T 

(ADDL ADJS *) 
(TO NP/D) 
(CAT N T 

(SETR N ») - 
(TO NP/2))) ■ 
(NP/2 / 
(POP (BUILDQ (NP (DET +) (ADJ +) (N +))/ 
DET ADJS N) / 
T)) ■ ■ / 
(NP/3 . / 

(POP (BUILDa (NP (NPR +)) , / 

.NPR) ^. ■ y , 

T)) : ■>-■■. 

) •■- ■ 
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Version! . "j 

.(PARSER ■ ! 

(LAMBDA (ACF) 
>(PROG (STATE NODE STACK REGS HOLD./ LEX) 

The current status' of the machine is kept in five glob,al 

variables; (H STATE, the state/ard " in the gi^mmar, (2) 
* . . NODE, the pointer -ointo the ^input . (3) \ REGS, the list of 

register name-value pairs, (M STACK, \the return stack, and 
.- * (5; HOLD, the hold list. Pitting the machine into a given 

configuration , involves assigning values of these five 

variables. 

SPREAD-ACF 

(STATE<-(CF.STATfi ACF)) 
(REGSiJCF.REGS ACF)) 
(STACK4-(CF. STACK ACF)) 
(HOLD^TCF.HOLD ACF)) 
(N0DE4-(CF.N0t)E ACF)) 
aEX4.TEDGE.WaRD (FIRST. EDGE NODE))) 

• . = .BRANCH dispatches control to the label specified by STATE. 

This is the method of executing an arc. 

EVALARC ^ ■ ■ ' 

(BRANCH STATE SUCCESS DETOUR S/ S/-2 S/-2-PUSH Q1/ 
" Q1/-1-PUSH.Q2/ Q3/ Q3/-2 Q3/-3 Q3/-^ Q3/-5 
Q3/-3-,pUSH Q4/ qI{/-2 Q4/-3 Q5/ Q5/-1-PUSH Q6/ 
' ■ ♦ Qb/-2 Q7/ Q7/-1-PUSH VP/ NP/ 

• NP/-2 NP/lVNP/1-2 NP/2 NP/3) , ^ ' 

SUCCESS checks to make sure all of *the input has been 
processed. If not it detours. 

SUCCESS 

. (if (EMPTYP.NODE NODE) > 
then (RETURN *) ■ 
else (GO DETOUR)) - . 

DETOUR decides which alternative to try next. In' tllis case 
the alternatives list is a stack. 

' DETOUR. 

(if ALTS 

then ACF'tr( ALTS. FIRST) 
(ALTS.BUTFIRST) 

(GO SPREAD-ACF) ■ • 
else tRETURN (FAILURE))) 

This is the beginning^ of the code which is compiled from the 
arcs. The first arc\of each state has a label which is v.lje 
same as the state name in the ATN. The other arcs have a 
label which is the. \state name followed by "-" and the arc 
number. Labels which", end in o"-PUSH" indicate the actions 
, and v^u'mination action \of PUSH arcs, 

. S/ (if (ARCCAT AUX) * \ 

» then (ALTARC S/-2) 

(SETR V *) ' 

(Sr.TR TNS <(GETF * TENSE) >) 

(SETRQ TYPE Q) 

(DOTO Q1/) 

(GO Q1/)) . ^ 
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^.-2(D0PUSH NP/ S/-2-PUSti) 

(U'O -NP/) 
S/-2-PUSH ' 

(SETR 'sub J ») 
(SETRQ TYPE DCL) 
(DOPTO Q2/) 
(GO Q2/) ■ 
■ Q1/ (DOPUSH NP/ .Q1/-1-PUSH) 
(GO NP/) 
Q1/-1-P0SH 

;SETR 'SUBJ ») 
DOPTO Q3/) 



;go 03/) 



Q2/ (if (.ARCCAT V) .. 

•then (SETR 'V ») ' 
(SETR 'TNS <(riETF « 'TENSE)>) 
(DOTO QV) 
(GO Q3/)) 
■ (GO DETOUR) 
Q3/ (while (ARCCAT V) and (GETF * 'PPRT) • 

and (gETR V)=~titL 
do (ALTARC Q3/-2) 

HOLD (GETR SUBJ))' 

,SETR- ,SUBJ OBUILDQ (NP XPRO SOMEONE)))) 
.SETR ,AGFLAG T) 

SETR V *y 

'dOTO Q3/J) , 
Q3/-2 

(if (ARCCAT V) and (GETF.» 'PPRT) 

- --^ andc (GETR ¥)= 'have ■' , 

then (ALTARC Q3/-3). 

.SETR ,TNS <! (GETR TN^S)' ! '( PERFECT) >) 
SETR V ») " -. ■ 



Q3/-3. 
(if. 



.DOTO Q3/) 
'go Q3/1) 



(TRANS (GETR V)) 
then (ALTARC Q3/-1) 

DOPUSH NP/ Q3/-^-PUSH) ■ 
(GO' NP/) ) • ■ > 
3/-4 • ■; 

(if (HOLDSCAN HOLD 'NP '(TRANS (GETR V>)) 
then (-ALTARC Q3/-5) 
PREVIRACTS) 
SETR OBJ ») 

DOVIRTO QH/) ■ ■ - 

GO Ql/)) 
Q3/-5 . ' . 

(if (INTRA^IS (GETR V)) 

then (DOPOP (BUILDQ (S + 4.(TNS +-) (VP (V +))) 
, • TYPE SUBJ TNS V))' • 

(GO EVALARC)) " v» ' 

(GO DETOUR) 
Q3/-3-PUSH, 

(SETR OBJ »). 

(DOPTO Q4/) . ^' 

(00 04/) . . • 

Q4/ (if (ARCWHD BY) and (GETR AGFLAG) 
then (ALTARC QH/-2) 

(SETR AGFLAG NIL.) 
(DOTO QY/) . 
(GO Q7/)) 



CH/-2 

(if (ARCWRD TO) and (S-TRANS ' (GETR V)) 
then (ALTARC QHZ-S): 
(DOTO Q5/) 
(GO Q5/5) 

(DOPOP (BUILDQ (S + +(TNS +) (VP (V +) 

, TYPE SUBJ TNS V OBJ)) 

(GO EVAL^RC) 
Q5/ '(SENDR SUBJ (GETR OBJ)) 
(SENDR 'TNS (GETR TNS)) 
(SENDRQ TYPE DCL) 
(DOPUSH VP/ Q5/-1-PUSH) 
(SREGS NIL) 

(GO VP/) ^ ■ 

Q5/-1-PUSH 

(SETR 'OBJ »•) • 
(DOPTO Q6/) 
GO 06/) 

Q6/ (if (ARCWRD .BY) and (GETR >AGFLAG) 

then (ALTARC Q6 '■ :) - 
(SETR AGFL.-J- NIL) 
(DOTO 07/) 
, (GO Q7/)) 

Q6/-2 

(DOPOP (BUILDQ (S + +(TNS +) 

(VP (V +)+)) 
TYBE ,SUBJ TNS V OBJ)) 

(GO EVALARC) 
g?/ (-DOPUSH NP/ Q7/-1-PUSH)' - 

■ (GO NP/) 
Q7/_1_PUSH ■ . 

(SETR 'SUBJ ») • • . 

SDdPTO Q6/) - 

■ GO 06/) . 

VP/.iif (ARCCAT V) and (GETF » 'UNTENSED) 
■then (SETR V ») ■ . 

■ • (DOTO Q3/) 

(GO Q3/n 
(GO DETOUR) 
NP/ (if (ARCCAT DET) ■ i 

tl-.en (ALTARC NP/-?) 1 ■ 
(SETR DET *) - 
• (DOTO NP/1 ) 
* (GO- NP/1 )) 

NP/-2- 
- -.(if -(ARCCAT NPR) 

th©i. (SETR-'NPR *) 
^ ' -(DOTO NP/3) ' 

(GO NP/3)) ■ / — : ^ 

(GO DETOUR)' 
NP/: vwhile*' (ARCCAT ADJ) do (ALTARC NP/1-2) 
• (ADDL 'ADJS ») ■ 

(DOTO NP/D). 



NP/1-2 

<if '"(ARCC/^T N) 

then (SETR N ») 
DOTO NP/2) 
(GO N^/2))- 
(|0 DETOUR) 
NP/2(D0P0P (BUILDQ (NP (DET +) 

ADJ +) 
(N +)) 

, ■ DET ADJS N-)) 

• (GO EVALARC) 

NP/3(D0P0P (BUILDQ (NP (NPR -^3) 

NPR)) 

(GO EVALARC)))) ' • 



Version II - ••■ . 

(PARSER 

(LAMBDA (ACF) ^ ' ■' . 

(PROG (STATE NODE STACK REGS FEATS HOLD » LEX SREGS " 
SFEATS FEATURES TEMP) 

If the function is called with an argument of 'GO, it looks 
for ^another parse. This allows the user to get out more 
than the first parse. 

(iTACFz'GO " , 
" then (GO DETOUR)) , ■ 

•The current cstatus ol 



^status of the^machine^is kept in > five, global 

variables: (1) STATE, the state/arc ir the grammar, (2) 

the- list of 



NODE, the pointer into the input, (3) REGS, j.xov. 
register name-value pairs, (4) STACK, the~return stack, "alfd 
■(5) HOLD, the hold list. Putting the machine into a given 
cdnf : i^uration involves assigning values to these five 
variables. ^ - - , 

SPREAD-ACF 

;CHANGESTATE (CF .STATE ACF) ) .< 
^REGSHCF.REGS ACF)) 
FEAT^J^CF. FEATS ACF)) ^ ■ ' 
^ajACK^CCF. STACK ACF)) ^ , 

HOLD±TCF.HOLiJ ACF)) 

.LEXijEDGE.WORD (FlflST.EDGE NODEi,(CF . NODE ACF)))) 

* o ^ ■ ^ . 

TRACEALTSTART is one of the tracing functions provided to 

allow the user to follow the operations of the parser. The 

others are TRACEARC and ABORT-. None of these, result in any 
code when a fast version of the parser is produced. 

(TRACEALTSTART). \ ' " - .. 

(GO EVALARC) . • 

NEXTLEX , ... 

If the current node has more t^^Vi. gne lexical' interpretation 
(BUTFIRST.EDGE) , the code set.::? MOuS to try the next one. . 

(if (BUTFIRST.EDGE NODE) . ' 

then LEX±(EDGE. WORD' (FIRST. EDGE ' • 

NOD^(BUTFIRST.EPGE 

(GO EVALARC)) 



NODE))) 



BRANCH dispatches control to. the 'label specified by STATE, 

EVALARC . ■ 

('BRANCH STATE SUCCESS DETOUR S/ S/-1-C0NT 

S/-1-.CAT S/-2-PUSH "Ql/ QIZ-I-PUSH Q2/ 
Q2/-1-C0NT Q2/.-1-CAT Q3/ Q3/^1-C0NT 
Q3/-2 Q3/-^2-CONT Q3/-3 Q3/-^ Q3/-5 
Q3/-1-CAT Q3/-2-CAT q3/-3-PUSH Q^/ Qi4/-2 Q4/-3 
^ Q5/ Q5/-1-PUSH Q6/ Q6/-2 Q7/ Q7/-1-PUSH 
VP/ VP/-1-.C0NT VP /-I -CAT NP/ NP/-UCONT 
NP/-2eNP/-2-.C0NT NPZ-I-CAT NP/-.2-Ci^T 
NP/1 NP/I-I-CJNT NP/1-.2 NP/f-2-CONT 
• • NP/1-1-CAT NP/1-.2-.CAT NP/2 NP/3) / 
SUCCESS / . . 

K (RETURN NODE) ,/ 



DETOUR chooses an alternative from the .ALTS list. In this 
•version the ALTS list is a stack. The detouring mechanism 
could be .changed by redefining ALTS. FIRST and ALTS.BUTFIRST. 
If there are no more alternatives, the first alternative 
from the list of SUSPENDED alts is taken.' The suspended 
alternatives ar .9- maintained in order by weight. 

ABORT * r-' . ■ ' . 

(ABORT) ABORT is a tracing function. 

DETOUR . ' . 

(if ALTS 

then ACFt-(ALTS. FIRST) .. . " ' : 

( a.lTs.butfikst). ■ - ■ 
(go spread-acf) 

elseif SUSPENDEDALTS v 

• then ACF±( SUSPEND. POP) 

(GO SPREAD-ACF) . . 

else (RETURN (FAILURE))) 
S/ (if (ARCCAT AUX) 
else (GO S/-2)) 
(ALTARC S/-2) 
(TRACEARC CAT AUX S/-1) 
S/-1-C0NT . 
(ALTCAT S/-1-CAT) 
■ SSTR V *) . 
(SETR 'TNS <(GETF *'TENSE)>) 
(SETRQ TYPE Q) 

(DOTO Ql/) ■ 
(GO 01/) 
S/-2(if ('STRIfJGLEFTP) 

then (NEXTLEXALT S/) 

(TRACEARC PUSH NIL S/-2) ^ . 
(DOPUSH NP/ S/-2-PUSH) 
-/"'^ (GO DETOUR) ) 

(CHANGESTATEQ S/) 

(GO NEXTLEX) . o 

S/-1-CAT v^-- ' 

(ARCCAT AUX) ' " ' " 

(TRACEARC ALTCAT AUX S/-1) 
' (GO S/-1-C0NT) 

S/-2-PUSH . ' V:. 

(SPREAD/WFS) 

•(SETR 'SUB J *)/ ' 
(S£rRQ TYPE DCL) 
(dOPTO Q2/) : 
GO 02/) ^ 
qi/ (if (5TRINGL&FTP) 

then (NEXTLEXALT Ql/) 

(TRACEARC PUSH NIL Q1/-1) • 
(DOPUSH NP/ Q1/-1-PUSH) 

(GO DETOUR)) . ' ■ ' 

(CHANGESTATEO <il/) 
. «.(G0 NEXTLPX) ^ 
Q1/-1-PUSH ■ • • ■ . . 

(SPREAD/WFS-)' ^ 
(SETR SUB J *) 

(DOPTO Q3/) ■ ' 

GO 03/) * V 

Q2/ (if (ARCCAT V) - 

eise (CHANGESTATEQ Q2/) / 
(G.O NEXTLEX)) / 
(NEXTLEXALT Q2/) ' / 

(TRACEARC CAT V .Q2/-1) . ' ■ 



.Q2/- r-CONT ■ 

(ALTCAX Q2/-1-CAT) 
(SETR V*) 

(SETR TNS <(GETF * 'TENSE)>)' ■ 
(DOTO QV) 
■(GO Q3/) 
Q2/-1-CAT ■ 

(ARCCAT V) - ■ ' 

(TRACEARC ALTCAT V Q27-1 ) 
(GO Q2/-1-C0NT) 
133/ (if •CMGCrfT- V) - • 

else (G0-Q-5^/-2}) " ^ • ' 

( ALTARC Q3/-2) 
(TRACEARC CAT V Q3/-I ) 
Q3/-1-C0NT 

. (ALTC1I._.Q!3/-^1-CAT) ' . > 

■ (if -((GETF »■ .PPRT)and (GETH V)='bE) 
then (GO ABORT)) 
(HOLD (GETR SUBJ)) 

(SETR ^SUBJ (BUILDQ ( NP (PRO SOMEONE) )) ) 
(SETR ^ AGFLAG T) . ^ 

(SETR 'V »)'v ■. 
<DOTO 03/) 
(GO Q3/) 
Q3/-2 

(if (ARCCAT V) ■ 

else (GO Q3/-3)) 
(ALTARC Q3/-3) . - - 

■ (TRACEARC CAT V Q3/-2-) 
Q3/-2-CONT 

(ALTCAT Q3/-2-CAT) 

(if -((GETF » PPRT) and (GETR V)='HAVE) 

then (GO. ABORT)) 
(SETR 'TNS <! (GETR TNS) ! ' 

(PERFECT)>) 

'SETR V ») 
(nOTO Q3/) ■ 
(GO Q3/) 
Q3/-3 

(if (STRINGLEFTP) and (TRANS (GETR V)) 
t hen (ALTARC Q3/-^} 

TRACEARC PUSH NIL Q3/-3) 
' IDOPUSH NP/ Q3/-3-PUSH) 
(CD DETOUR)) 
O3/-4 . , \ ■ ■ 

(if TEMP (HOLDSCAN HOLD 'NP '(TRANS (GETR V))) 
then ■(ALTARC Q3/-5)' 

. (TRACEARC VIR NP'Q3/-J4) 

(PREV15ACTS) . . ■ 

(SETR OBJ ») . V 
(DOVIRTO Q4/) . 
(GO QH/)) 

.Q3/-5 

("if ( INTRANS (GETR V) ) ■ ■ 

then- (NEXTLEXALT Q3/) ' ^ 

(TRACEARC -POP NIL Q3/-5) ■ 
(DOPOP (BUILDQ (S + +(TNS 

(VP (V +))) . 
TYPE SUBJ TNS V) 
(GETR POPFEATS)) 
(GO DETOUR)) 

■ ■ (CHANGESTATEQ- Q3/) - • 

(GO NEXTLEX) , 



Q3/-1-CAT 

( ARCCAV V) 

(TRACEARC ALTCAT V Q3/-1) ' 
(G0'Q3/-1-C0NT) 
Q3/-2-CAT . , ' ' 

( ARCCAT V) ■ 
(TRA6EARC ALTCAT V Q:/-2) 
(GO Q3/-2-CONT) 
Q3/-3-PUSH 

TSPREAD/WFS) 
(SETR OBJ *) 
DOPTO Q4/) 
GO Q^/) . 
Q4/ (if (ARCWRD BY) and (GETR AGFLAG) 
^. then lALTARG Q4/-2) 

(TRACEARC WRD BY Q4/-1^ 
(-SETR AGFLAG NIL) 
(dOTO Q7/) ' . 

(GO Q7/)) 

Q4/«2 

..(if (ARCWRD TO) and (S-.TRANS (GETR V)) 
then (ALTARC Q4/-3) 

(TRACEARC WRD TO Q4/-2) 
(DOTO Q5/) 
> ' . (GO Q5/)) . 

■ CNEXTLEXALT Q4/) 
■ TRACEARC POP NIL Q4/-3) 
(DOPOf (BUILDQ (S + ^•(TNS +) 

(VP (V +)+) ) ■ 
TYPE SUBJ TNS V OBJ) 
(QETR POPFEATS)) 
."* (GO DETOUR) 
Q5/ (if (STRINGLEFTP) 

then (NEXTLEXALT Q5/) 

(TRACEARC PUSH NIL Q5/-1) 
(SENDR SUBJ (GETR OBJ)) 
(SEKDR 'TNS ,(GETR TNS)) 
(SENDRQ TYPE DCL ) 
(DOPUSH VP/ Q5/-1-PUSH) 
• SREGStNIL ' - 

.. SFEATS4-NIL ^ : 

" (GO DETOUR) ) ■ 
• (CHANGESTATEQ Q5/) 
(GO NEXTLEX) 
Q5/-1-PUSH 

(SPREAI5/WFS) , 
(SETR OBJ *.) 
. (DOPTO Q6/) ■ 

(GO 06/) ^ - - 

Q6/ (if (ARCWRD BY) and (GETR AGFLAG). 
^ then (ALTARC Q6/-2) 

(TRACEARC WRD.BY Q6/-.r) 
(SETR AGFLAG NIL) 
(DOTO 07/) 
(GO Q7/)) 

Q6/«2 

. (NEXTLEXALT Q6/)- ■ ,v. , 
(TRACEARC POP NIL. Q6/-2) ■ 
(DOP.OP (BUILDQ (o'+ +(TNS +) 

(VP (V +)+)) 
TYPE SUBJ TNS V OBJ) 
. (GETR POPFEA'^)) 
(GO DETOUR) -•^■^ ■ 
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•Q7/ (if (STRINGLEFTP) 

^.hen (NEXTLEXALT Q?/) 

. . TRACEARC PUSH NIL 07/-1) 
• DOPUSH NP/ Q7/-1-PUSH) 
(GO DETOUR)) 
(CHANGESTATEQ Q?/) 
(GO NEXT. SX) ■ 
Q7/-1-PUSH: 

SPREAD/WFS) 
■ (6ETR 'SUB J *) 
(DOPTO Q6y) 
(GO 06/) • ■ 
VP/ (if (ARCCAT V) 

else (CHANGESTATEQ VP/) 

• ' . (GO NEXTLEX)) . 
(NEXTLEXALT VP/) 

M-^-A CEAnC . OA-'h-^-W / ' ] ) 

VP/-UCONT . 
(ALTCAT.VP/ ■ .AT:)V 

• (if..'^(GE-TF.'* « ...N'^ENSED) 

-then (GO CSO' ':) ) . 
(SETR 'V *) 
(DOTO (?3/) 
(GO Q3/) . 
VP/-1-CAT' 

(ARCCAT V) 

(TRACEA.RC ALTCAT V VPz-l ) 

(GO VP/-1-C0NT) 
NP/ (if (ARCCAT DET) 

else (GO UP/-?,)) 

(ALTARC NP/-2)' 

(TRACEARC CAT DET N?/-:) 
NP/-1-C0NT 

(ALTCAT NP/-1-CAT) 
■ ■ (SETP DET. *) 

(DOTO NP/1) - 

(GC NP/1) 

NP/-2 - ^- : ' 

(\r (ARCCAT HPR) 
' • else (CHANGPSTATEQ NP/) 
■ (,G0 NEXTLEX)} 
(NEXTLEXALT NP/) ^ 

(TRACEA'*:C CAT NPR NP/ -2) 

N?/-2-C0[;T . ^ . 

(ALTCAT NP/--:-.-CAT) 

(SETR^ NPR 
. -(DOTO NP/'^)' 

(GO NP/3) 
:NP/-1-CAT 

(ARC^^AT DET) 

(TRA-. £ARC ALTCAT DET NP/-1) 
(GO NP/-UCONT) . 
NP/-2-CAT 

(ARCCAT NPR) 

(TRACEARC ALTCAl N:^R NP/-2) 

(GO NP/-2-C0NT) 
NP/1 (if C.RCCAT ADJ) 

■ else (GO NP/1-?)) 

( ALTARC- NP/1 -2) 

^(TRACEARC .^AT ADJ MP/1-1) 
NP/1-1-C0NT . . 

(ALTCAT NP/ 1-1 -CAT) 
ADDL ADJS *) 

(DOTO NP/n • = ' 

• (GO .NP/1 ) 



NP/1-2 

(if (ARCCAT N) 

else (CHANGESfATEQ, NP/1 >, 
(00 NEXTLEX)) . 
. (NEXTLEXALT NP/1) 
. (TRACEARC CAT N NP/1-2) 
KP/'1-2-C0NT 

(ALTCAT NP/l-if-C.^T) . 

• (SETR N *) , 
(DOTO NP/2) 
(GO NP/2) 

NP/1-1-CAT 

(ARCCAT ADJ) 

• (TRACEARC ALTCAT ADJ NP/1-1 
(GO NP/1-1-C0NT) 

NP/1-2-CAT 

(ARCCAT N) 

(TRACEARC ALTCAT N NP/1-2) 

— — ret!)ritP7^^ 

NP/2 (NEXTLEXALT Nr/2) 

(TRACEARC POP NIL MP/l:-1) 
■ ^' IdOPOP, IBUILDQ (NP (DET +) 
' ; ' (ADJ +) 

(N +.)) 
DET AEJS' N) 
(GETR POPFFATS)) 
(GO DETOUR) 
NP/3 NEXTLEXALT NP/j} 

(TRACEARC POP NIL NP/3-i)' 
(DOPOP (BUILDQ (NP (NPP +)) 
NPR) ■ 
(GETR' POPFEATS) ) 
(GO DETOUR)))) 
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Trace of Version I Parsing. a Sentence 

PARSE((JOHN WAS BELIbJVED TO HAVE BhiEN SHOT BY FRED)) . 

Starting alternative 0" * " 

At arc S/ ' 

Node = (((JOHN NPR (&)) ((WAS" V & AUX &) (&&)))) . 

The sentence is converted into a chart format. The, onart 

Contains information about the possible parts of speech of each 

word. Notice that"*"was" can be. either a verb (V) or an auxiliary 

verb (AUX). (An is used to indicate a' further structure.) 

Taking PUSH arc S/-2 

The. trace indicates the arc type and its location in the grammar. 
No alternative is -stored because S/-2 is the last arc in the 

— : sJLal- ■?„,SZ_-^cLJ-,h^re_are_.ho' lexical alternati ve 3 . 

PUSHing for NP/ 
•Taking. CAT NPR arc NP/-2 
Setting NPR to JOHN 

. The trace also indicates .where registers get set. 



Entering state NP/3 
Node = (((WAS V (&) AUX (&)) ((BELIEVED V &) (& &)))) 
Taking POP arc- NP/3-1 ^ ■ • • 

Trying to POP 
(Continuing arc S/-2-PUSH) 

Setting SUBJ to (NP (NPR JOHN)} • . 

Setting TYPE to DCL , ' 

Entering state Q2/ 

Node ■= T((WAS V (&) AUX (&)) ((BELIEVED V &) (& &)))) 
Taking CAT V arc Q2/-1 

Setting V to BE ' \ 

Setting -TNS to (PAST) • 

Entering st'a-te Q3/ 

Node - T((BELIEV[£D V (&)) ((TO P^REP &) (& &)))) 

The alternative configuration to try the second arc leaving Q3/ 
(•Q3/2)". is created and saved after the test has succeeded on the 
first arc but before the arc is taken. This is alt 2 because 
configuration 1 was created during tjie earlier PUSH arc (ci.e. 
the- number' is a configuration number). 

• ' ,% 

St.orlhg alt 2 for^'^arc Q3/-2 
Taking CAT V-..arc Q3/-t- 
•', HOLDing (NP (NPR^^ JOH.N ) ) . ' ' • ■ . 

Setting SUBJ to' (NP (PRO SOMEONE)).' 
Setting AGPLAG to T 

Setting V't-d BELIEVE " . ■ ■ 

Entering state Q3/ 

Node = T((TO PREP'(&)) -((HAVE V *4) (& &))))■ 
. Storing alt 3 Tor arc Q3/-^ " . 

•' Takinfz; PUSH arc 03/-3 ■ ■ • ■ 

PUSHing for NP/ • . ■■ ' • • ^ 

BLOCKED . ' . . 

Starting -alt ernative 3 ' 

At arc 03'<-^ ^ ' : ... 



Node = (((TO PHEP (&)) ((HAVK V &): (& &))» 

Storing alt 5 Tor arc Q3/-^5 
Taking VIR NP arc Q3/-^ 
(NP (NPR JOHN)) removed from HCL ) iisl 
. Setting OBJ to. INP (NPR JOHN;) 

Entering state QH/ 

Node = T((TO PREP (&)) ((HAVE V- &) i)))) 

Storing alt 6 for arc Qk/-3 
Taking WRD TO arc 

Entering stat6' Q5/ 

Node = .(((HAVE V (&)) (.(BEEN V &). (& &)))) 
Taking PUSH -arc Q5/-1 ' 

SENDing SUBJ value of (NP (NPR JOHN)) 

SENDing TNS value of (PAST) 

SENDing TYPE value of DCL 
PUSHing for VP/ 

— T-a4c-i- ng - G AT V arc. V F-/-H ^ 



• Setting V to HAVE 

Entering state Q3/ 
Node = (((BEEN V (^)) ((SHOT V &) (& &)))) 

Storing alt 8 for arc Q3/-3 
Taking CAT V arc Q3/-2 

Setting TNS to (PAST PERFECT) 
Setting V to BE 

''Entering state Q3/ 

Node = .(((SHOT V (&)) ((BY PREP &) (& NIL)))) 

Storing alt 9 for arc Q3/-2 
Taking CAT V arc Q3/-1 
•^;HOLDing (NP (NPR JOHN)) , 
^ Setting SUBJ to (NP (PRO SOMEONE)) 
Sjstting AGFLAG to T 
Setting V to SHOOT 

Entering state Q3/ 
Node = (((BY PREP (&)) .((FRED NPR &) NIL))) . 

Storing alt 10 for arc Q3/-^ 
Taking PUSH arc Q3/-3 
PUSHing for NP/ 
BLOCKED • , 

Starling alternative 10 
At arc Q3/-4 • . 

Node = (((BY PREP (&))( (FRED NPR &) NIL))) 
-.Storing alt 12 fcr arc Q \/-5 
Taking VIR NP arc Q3/-^ 
(NP (NPR JOHN)) removed from HOLD lis^ 
■•■ Setting' OBJ to (NP (NPR JOHN)' 



Entering: state Q4/\ 
Node = (((BY PREP^ 
Storing alt 
. Taking WRD BY arc 
Setting AGFLAG 



(&)) ((FRED NPR &) NIL))) 
13 for arc Q4/-2 • 
Q4/-1 
to NIL 



Enter ing^'state Q7/ , 
Node = .(((FRED NPri (&)) NIL)) 
Taking PUSH arc QW-1 
PUSHing for NP/ ! 
Taking CAT NPR arc NP/-2 
Setting NPR to FRED 



Entering stale NP/3 
Node = (NIL) 
•Taking POP arc NP/3-1 
Trying to POP 

(Contimung arc .Q7/-1-PUSH) 

Setting SUFiJ to (NP (NPR FRED)) 

•Entering state Q6/ , " 
Node = (NIL) 
Taking POP arc Q6/-2 
Trying to POP 

(Continuing arc Q5/-1^PUSH) 

• Setting OBJ to(S DCL (NP (NPR FRED)) ■ 

(TNS (PAST PERFECT)) 
(VP 

(V SHOOT) (NP (NPR JOHN))) ) 



-fatter- irig state -^oy — — 

Node = INIL) . ' . - " • 

Taking POP arc Q6/-2 
Trying to' POP 

Trying to SUCCEED- ... 
S DCL ■ 

NP PRO SOMEONE - ^ ' • ' . 

TNS PAST 
VP V BELIEVE 
S DCL • 

NP '.NPR FRED One successful parse. Parser 'Contrnues 

uD^u^c^r^r.^^.^^^^^"^ ' ' because it was bei.n^'; run in a mode which 

V r V onUO 1 . t t • • t i 

, NP NPR JOHN ■ • ' returns all possible parses. 



Starting alternative 1-1 
At arc Q4/-2 

Node = (((BY PREP (&)) ((FRED NPR &) NIL))) 
Taking POP arc Q4/-3, - ■ ■ 
Trying to POP 

(Continuing arc Q5/-1-P.USH) 

Setting' OBJ to (S DCL (N? (PRO SOMEONE)) 



(V SHOOT) (NP (NPR JOHN.))) 



TNS (PAST PERFECT)) 
VP 



Entering state Q6/ t ' ■ ' ; 

Node = (((BY PREP (&)) ((FRED NPR &) NIL))) 

. Storing alt 15 for arc Q6/-2 ■ . 

Taking WR1> BY arc w6/-1 

i^etting' AGFLAG to NIL " . ^ 

, Entering state Q7/ .... • . 

Node = (((FRED NPR (&)) NIL)) • . 

Taking PUSH arc Q7/-1 

PUSHiog for' NP/ . . • r . 

'Taking CAT NPR arc NP/-2 ' •. • . 

' Setting NPR to FRED 

Entering state NP/3 " ' - . 

Node = (NIL) ' • 
Taking POP arc NP/3-1 - 

• Trying to POP , ■ '"'^ 

(Continuing are 07/- 1 -PUSH) ■ ' ■ * 

Sett^npr SUB J to.(NP (NPR FRED)) "v^- •^•> 
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83 



^.^v 



Enter fog state Q6/ . . . . 

Node , ( NIL')-' 
Taking POP arc Q6/-2 
Trying to POP 
Trying to SUCCEED 

S DCL . 

NP NPR FRED . 
TNS PAST 

VP V BELIEVE Second possible parse. 

S DCL 

NP PRO 'SOMEONE 

TNS PAST PERFECT - - 
VP V SHOOT . 
NP NPR JOHN 

Starting alternative 15 ■ " 

At arc Q6/-2 

^, (((BY PHE P (&,)) ( (-FRaEh-N rn & ) nil))) ^ 

Taking POP arc Q6/-2 

Trying- to POP . . 

T'-ying to SUCCEED . ' . . 

b;,ocked, • . ^ 

starting alternative 12 . 
At arc Q3/-5 

Node = (((BY PREP (&)) ((FRED NPR &) NIL))) 

■ BLOCKED . . • . ^ . ': ■ 

Starting alternative' 9 
At arc 83/-2 ' 

Node' -= (((SHOT V (&) ) ((BY PREP &) (& NIL)))) * 
BLOCKED 



Starting alternative 8 ^ 
At arc Q3/-3 

Node =. (((BEEN V(&) ) ( (SHOT V &)(&&))) ) 
BLOCKED ■ :., 

Startiog alternative 6 , , 

At arc Q4/-3 

No'de = (v(TO.PREP (&)) ((HAVE V &) (& &)))■) 

Taking POP arc Q*4/-3 

Trying to POP ♦ ^ . ■ 

Trying to SUCCEED 

BLOCKED ^ ' 

Starting alternative 5 
At arc u^/~5 

Node-= (UTO PREP (&)) ((HAVE V &) (& &))))■ 
BLOCKED 

Stai^ting alternative 2 
At arc y 3/-2 

Node.^ (((BELIEVED V (&)) ((TO PREP &) (& &)))) 
BLOCK-ED- ' 
NIL ■ ■ c ^■ 
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Appenoix L ' . 
Grammar Compiler Declara.t. ions 



Specification of Feat ures 



Some features of the general ATN parser' • require a good deal oV 
bookkeeping. For ^ example, SYSCONJ requires a parser to save the path that., 
it 'takes through the grammar. This more than doubles the amount of storap-e 
overheac". To relieve the burden of those features, such as SYSCCNJ , . wh ich 
increase the overhead, and which a part.icular application may- -not require. 



the use.r can specify which features his prammar uses. The compiler will., 
then tailor the object ..code to . tnose needs* The user 'specifications 
consist of a collection of flags which are ^set at compile time. A 
description of '.each flafi i ..sether *with its default setting is given'below. " 



HOLDFLG: If the grammar does not. ui-ie the .HOLD facility, setting this rian . 
to NIL will elimitate one field in a configuration, Derault is'T. 

FEATUHESFLG: If the grammar doesn't use the f'^ature facility, setting t.his 
flag to NIL will eliminate one licid in a configuration. Default' is T. 

WFSTFLG: If -the crammar uses the well-formed substring feature, WFSTFLG 
should be non-NIL,. Default is NIL. ' ., 

ALICATSFLG:' If tnis flag is NIL, the compiler will not compile the ability 
to handle multiple interpretations of a word within a single' category . If 
ALTCATSFLG is a list, it will compile this ability Tnto those CAT arcs 
wnose categories are members' of the list. If T, it' will compile, t.his 
ability into all CAT aires: Default is T. cp 

.SYSCONJFLG: If the ^^rammar ^ses the LUNAR SYSCONJ conjunction-handling 
-facility SYSCONJFLG sho.uld be non-NIL. Default is NIL. ( SYSCONJ has* not 
lDeen implemented yet.) • • . 

STARTSTATE: VThis -should' be the start state of the grammar. Default value 
is S/. ■ • • _ ■ 

NULLPUSHFLG: If ..NULLPUSHFLG is' non-NIL, a PUSH arc will never be taken it 
I'here is no input left. Default setting is T. " ' 

.UNAMBIGUOUS-CHART: If the inp.ut chart is never an>biguous,, set.ting this 
flag to a non-NIL value will avoid the checking for an alternative lexical 
interpretation. Default is NIL. ' i ' 



1 ■ ■■ 

rnis iy^gins to legislate out Pl/SHes which do not use any of the 
inputs. In /pr.act ical terms, this meams that .a FUSHed to net. work has ,t o do 
more- than just t.ake constituents off the hold list. In theoretical terms, 
it closes one of the holes which \ may allow an ATN gramn:ar'' to be 
undecidable. . \ • ^ 



Declarations for Arc Tests . and' Actions - . ' ;• , 

The te':'. '.ind actions on an arc can be arbitrary LISP "expressions . To 
compile . thtoe lunction calls,' .the grammar compiler must know which 
arguments get evaluated. In general the grammar compiler gets this 
information from the same, declarations about functions that the LISP 
compiler uses (NLAMA, NLAML , FNTYPE, etc.). In addition a facility is 
provided wh'ich allows the user to tell the grammar compiler how to compile' 
the individual arguments to particular functions. Using this facility it 
is possible t^ write function calls in the crarnmar which implicitly QUOTE 
some of their arguments and evaluate others or even which call another 
function to decode their arguments. The compiler is told how to^compile 
the- arguments to a function by puttinp a specification as the value of the 
property , GHAMMARARGINFO on the property list of the function name. The 

value of GRAMMARARGINFO property should be one of the following: 

1) LAMBDA: the function evaluates all of its arguments..) (This is the 
defaul't case . ) ■ 

2) NLAMBDA: the function doesn 't evaluate any of its argumer^ts. This 
can also be done by putting the lunction on either of'the lists NLAMA 
or NLAML (see INTLRLISP compiler ) . ■ " • ' 

3) A list which specities how eacn argument should be treated. Each 
element of the list, can be: 

r . 1) E or NIL. - This argument position will. be evaluated. This is the 
usual, case where the act-ion expect s its argument to be evaluated and. 
^ tells the grammar compiler '.io ocan the argument for embedded calls. 

2) Q.UOTE - This argument is embedded in QUOTE. This provides a 
convenient way of automatically Quoting ce^rtain argument positions 
in- a function- call . \ 

3) * - The argarnent is npi-''e^rn.pilec] by the grammar compiler but is 
m^ rely copied ;--{N,0;tje': ■; Ar.gtfm^rits Which occur-in this position . should 
rot xhav.e...ah\ embfiSjd.ed/ as., these will - not ' be scanned by the 
compiler. T / "-[V - . . v 

: . ' ' - ^) Any other at'om-;':'-- atom 'is- t.he name of a function which when 
' APPLYed to the argu^nent returns the compiled forrnT' 



Examples: The grammar function SETH which sets the ' value of a register 
could be compiled by-having a- GRAMHAPAHGINFC property of -(QUOTE E) The 
arc action (SETR ANAPHORFIG T) would compile into (SETR (QUOTE ANAPHORFLG) 
T). SETH" is "del ined as a LAMBDA function (I.e. the interpreter evaluat-es 



2 ' ' * 

!' . SETR is, in fact, recognized specially by the^grammar . compiler so 

. that it can keep track of the^ registers w-hich. are usee! in the grammar. 



its arguments) which avoids the explicit eaH to fiVAL which' results from 
having SETR be a NLAMfiDA. f unction ( i .e . . the interpreter doesn 't .^evaluate 
its arguments)/ *• i ^ • \ 

• In the LUN^fi grammar, many of the arc" functions .use EVALLOC to 
evaluate" one or more Qf their arguments. EVALLOC,. has. three options: (1) 
if its argument <is or NIL , it . gets the value of the 'current 'thiriR.'l.- 
(2)' if -the argument is atomic, it is a register whose value , is retrie^yed ; 
and (3) if the argument is a list, it is; evaluated. . This allows J the 
grammar to be clearer and less* cluttered with predictable function calls. 
To accomplish the same results using the. compiler, a ^version of EVALLOC 
(CEVALLOC) is' provided' which returns^ the form for the decoded argument. 
The functions which use . it are, then ' give^n' ...GRAMMARARGINFO property of 
CEV ALLOC for those argument positions which 'need decoding. . This means that 
the decoding process takes . place once at compile time instead/ of each ti;ne 
the arc is tried. For exagiple, ia the LUNAR, grammar' the function 'MARKER 
has.oa GRA'MMAJIARGINFO property of (CEVALLOC QUOTE*). This allows the grammar 
to have . (.MARKER N MASS) ' as . an ^ action which compiles i/iio 
(MARKER (GETR.^N) (CiUOTE MASS) ) and avoids an explicit call .to EVAL^. by 
MARKB6. Notice that by using this technique^ the grammar writer can easily 
specify default arguments to actions in hi.s grammar (at very little 
computational cost) and greatly improve the readccbiiity oi the grammar. 



